4.8 KiB
4.8 KiB
| id | title | status | assignee | created_date | labels | dependencies | priority | ||
|---|---|---|---|---|---|---|---|---|---|
| task-010 | Netcup RS 8000 infrastructure hardening and maintenance | Done | 2026-03-15 07:30 |
|
high |
Description
High-priority infrastructure tasks for the Netcup RS 8000 production server (20 cores, 64GB RAM, 3TB) running 40+ live services. Covers security hardening, storage cleanup, monitoring, and reliability improvements.
Acceptance Criteria
- #1 Audit and rotate stale secrets (Infisical + KeePass) — identify unused or old credentials
- #2 Review and harden Traefik config — audited, fixes in task-011 (requires host access)
- #3 Storage cleanup — pruned ~330GB (74%→62% disk), removed 15 dead containers
- #4 Set up automated Docker image pruning — script at /opt/apps/dev-ops/docker-weekly-prune.sh, cron in task-011
- #5 Health check dashboard — audited (179 monitors, 46 unmonitored containers), gaps in task-014
- #6 Backup verification — audited: NO automated backups anywhere, remediation in task-013
- #7 Review container resource limits — added limits to 7 top consumers (postiz x3, p2pwiki-db, elasticsearch, gitea, immich_postgres)
- #8 Update base images — audited, 5 critical/10 high upgrades needed, tracked in task-012
- #13 Upgrade p2pwiki-db MariaDB 10.6→10.11 (backup + upgrade + mariadb-upgrade complete)
- #9 Fix p2p-db CPU (174%→0.02%) — added missing wp_options index, cleaned 15k duplicate rows
- #10 Fix p2pwiki CPU (50%→15%) — blocked Applebot hammering Special/API pages via .htaccess
- #11 Remove junk containers — stopped funny_mirzakhani (cat /dev/urandom), payment-safe-mcp (9149 restarts)
- #12 Vault-migration audit — 20 of 26 secrets confirmed stale, 2 active, 4 unclear. Deletion pending via Infisical UI
Audit Results (2026-03-15)
Secrets Audit — 121 secrets, 18 folders
- HIGH:
vault-migrationfolder has 26 likely stale secrets (Pusher, Holochain, Obsidian, old Cloudflare tokens, test Stripe keys) - HIGH: 6+ duplicate secrets across folders (Syncthing x6, GitHub x3, Cloudflare x3, RunPod x2)
- MED: Test/dev keys in prod (Duffel test, Stripe test), 2 orphaned root-level secrets
- Action: Audit each vault-migration key, consolidate duplicates, remove test keys
Container Health — 303 containers, 161 without health checks
- FIXED: Stopped
funny_mirzakhani(junkcat /dev/urandomcontainer) - FIXED: Stopped
payment-safe-mcp(9,149 restart loop, no logs) - FIXED: Removed 15 crashed/init containers
- URGENT:
p2p-dbat 78-132% CPU (MariaDB, investigate queries) - URGENT: 293 of 303 containers have ZERO resource limits
- Top memory hogs without limits: postiz x3 (~2GB each), p2pwiki-db (1.8GB), gitea (1.7GB)
erpnext-queue-longcrashed 7 days ago — needs restart
Storage — 2.1 TB used / 3.0 TB (74%)
- IN PROGRESS: Docker prune running (build cache ~347GB, dangling images ~50-80GB)
- 115 dangling volumes (~2.5GB)
- 20+ stopped rspace services sitting 2 weeks
payment-infrarebuild loop generating constant dangling images
Traefik Security — 3 critical, 3 medium (requires HOST access)
- C1: No TLS minimum version (defaults to TLS 1.0)
- C2: No capability drops on Traefik container
- C3: Ports 80/443 on 0.0.0.0 — bypasses Cloudflare
- M1: No rate limiting middleware
- M2:
insecureSkipVerifyon pentagi transport - M3: No default Content-Security-Policy header
Host Commands Needed
Traefik TLS hardening (run on host)
# Create TLS options file
cat > /root/traefik/config/tls-options.yml << 'EOF'
tls:
options:
default:
minVersion: VersionTLS12
cipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
EOF
Traefik rate limiting (run on host)
cat > /root/traefik/config/rate-limit.yml << 'EOF'
http:
middlewares:
rate-limit:
rateLimit:
average: 100
burst: 200
period: 1s
EOF
Traefik container hardening (edit docker-compose on host)
Add to Traefik service:
cap_drop: [ALL]
cap_add: [NET_BIND_SERVICE]
security_opt: [no-new-privileges:true]
read_only: true
tmpfs: [/tmp]
deploy:
resources:
limits:
memory: 512M
cpus: '2.0'
Restrict ports to localhost (edit docker-compose on host)
ports:
- "127.0.0.1:80:80"
- "127.0.0.1:443:443"
Then restart Traefik: cd /root/traefik && docker compose up -d