rspace-online/docs/DATA-ARCHITECTURE.md

251 lines
8.8 KiB
Markdown

# rSpace Data Architecture — Layered Local-First Model
> **Status:** Implemented (Layers 0-2), Designed (Layer 3), Deferred (P2P)
> **Last updated:** 2026-03-02
## Overview
rSpace uses a 4-layer data architecture where plaintext only exists on the user's device. Each layer adds availability and collaboration capabilities while maintaining zero-knowledge guarantees for encrypted spaces.
```
Layer 3: Federated Replication (future — user-owned VPS)
Layer 2: Shared Space Sync (collaboration — participant + relay mode)
Layer 1: Encrypted Server Backup (zero-knowledge — cross-device restore)
Layer 0: User's Device (maximum privacy — plaintext only here)
```
---
## Layer 0: User's Device (Maximum Privacy)
The only place plaintext exists for encrypted spaces.
### Storage
- **IndexedDB** via `EncryptedDocStore` — per-document AES-256-GCM encryption at rest
- Database: `rspace-docs` with object stores: `docs`, `meta`, `sync`
### Key Hierarchy
```
WebAuthn PRF output (from passkey)
→ HKDF (salt: "rspace-space-key-v1", info: "rspace:{spaceId}")
→ Space Key (HKDF)
→ HKDF (salt: "rspace-doc-key-v1", info: "doc:{docId}")
→ Doc Key (AES-256-GCM, non-extractable)
```
### Encryption
- `DocCrypto` class handles all key derivation and AES-256-GCM operations
- 12-byte random nonce per encryption
- Keys are non-extractable `CryptoKey` objects (Web Crypto API)
- `EncryptedDocBridge` connects WebAuthn PRF to DocCrypto
### Implementation
- `shared/local-first/crypto.ts` — DocCrypto
- `shared/local-first/storage.ts` — EncryptedDocStore
- `shared/local-first/encryptid-bridge.ts` — PRF-to-DocCrypto bridge
---
## Layer 1: Encrypted Server Backup (Zero-Knowledge)
Server stores opaque ciphertext blobs it cannot decrypt.
### Design Principles
- **Opt-in per user** (default OFF for maximum privacy)
- **Same encryption as Layer 0** — client encrypts before upload
- **Delta-only push** — compare local manifest vs server manifest, upload only changed docs
- **Cross-device restore** — after passkey auth, download all blobs, decrypt locally
### Storage Layout
```
/data/backups/{userId}/{spaceSlug}/{docId-hash}.enc
/data/backups/{userId}/{spaceSlug}/manifest.json
```
### API
```
PUT /api/backup/:space/:docId — upload encrypted blob (10 MB limit)
GET /api/backup/:space/:docId — download encrypted blob
GET /api/backup/:space — list manifest
DELETE /api/backup/:space/:docId — delete specific backup
DELETE /api/backup/:space — delete all for space
GET /api/backup/status — overall backup status
```
All endpoints require EncryptID JWT authentication.
### Client
- `BackupSyncManager` reads already-encrypted blobs from IndexedDB (no double-encryption)
- Auto-backup on configurable interval (default: 5 minutes)
- `pushBackup()` — delta-only upload
- `pullRestore()` — full download for new devices
### Implementation
- `server/local-first/backup-store.ts` — filesystem blob storage
- `server/local-first/backup-routes.ts` — Hono REST API
- `shared/local-first/backup.ts` — BackupSyncManager
---
## Layer 2: Shared Space Sync (Collaboration)
Multi-document real-time sync over WebSocket.
### Two Operating Modes
#### Participant Mode (unencrypted spaces)
- Server maintains its own copy of each Automerge document
- Full Automerge sync protocol — `receiveSyncMessage` + `generateSyncMessage`
- Server can read, index, validate, and persist documents
- Documents saved as Automerge binary at `/data/docs/{space}/{module}/{collection}/{itemId}.automerge`
#### Relay Mode (encrypted spaces)
- Server forwards encrypted sync messages between peers by `docId`
- Server cannot read document content
- Opaque backup blobs stored via `relay-backup` / `relay-restore` wire protocol
- Stored as `.automerge.enc` files alongside regular docs
### At-Rest Encryption for Module Docs
When a space has `meta.encrypted === true`, module documents are encrypted at rest using the server-side encryption utilities (HMAC-SHA256 derived AES-256-GCM from `ENCRYPTION_SECRET`).
File format (rSEN):
```
[4 bytes: magic "rSEN" (0x72 0x53 0x45 0x4E)]
[4 bytes: keyId length (uint32)]
[N bytes: keyId (UTF-8)]
[12 bytes: IV]
[remaining: ciphertext + 16-byte auth tag]
```
### Wire Protocol
```
{ type: 'sync', docId, data: number[] }
{ type: 'subscribe', docIds: string[] }
{ type: 'unsubscribe', docIds: string[] }
{ type: 'awareness', docId, peer, cursor?, selection?, username?, color? }
{ type: 'relay-backup', docId, data: number[] } — client → server (opaque blob)
{ type: 'relay-restore', docId, data: number[] } — server → client (stored blob)
{ type: 'ping' } / { type: 'pong' }
```
### Implementation
- `shared/local-first/sync.ts` — DocSyncManager (client)
- `server/local-first/sync-server.ts` — SyncServer (server)
- `server/local-first/doc-persistence.ts` — filesystem persistence with encryption
- `server/local-first/encryption-utils.ts` — shared server-side AES-256-GCM primitives
- `server/sync-instance.ts` — SyncServer singleton with encryption wiring
---
## Layer 3: Federated Replication (Future)
Optional replication to user's own infrastructure.
### Design (Not Yet Implemented)
- Same zero-knowledge blobs as Layer 1
- User configures a replication target (their own VPS, S3 bucket, etc.)
- Server pushes encrypted blobs to the target on change
- User can restore from their own infrastructure independently of rSpace
### Prerequisites
- Layer 1 must be proven stable
- User-facing configuration UI
- Replication protocol specification
---
## P2P WebRTC Sync (Future)
Direct peer-to-peer sync as fallback when server is unavailable.
### Design (Not Yet Implemented)
- WebRTC data channels between clients
- Signaling via existing WebSocket connection
- Same Automerge sync protocol as Layer 2
- Useful for: LAN-only operation, server downtime, low-latency collaboration
### Prerequisites
- Layer 1 backup solves the primary resilience concern
- WebRTC signaling server or STUN/TURN infrastructure
---
## Threat Model
### What the server knows (unencrypted spaces)
- Full document content (participant mode)
- Document metadata, sync state, member list
### What the server knows (encrypted spaces)
- Space exists, number of documents, document sizes
- Member DIDs (from community doc metadata)
- Timing of sync activity (when peers connect/disconnect)
### What the server CANNOT know (encrypted spaces)
- Document content (encrypted at rest, relay mode)
- Backup blob content (client-encrypted before upload)
- Encryption keys (derived from WebAuthn PRF on device)
### Compromised server scenario
- Attacker gets ciphertext blobs — cannot decrypt without passkey
- Attacker modifies ciphertext — AES-GCM auth tag detects tampering
- Attacker deletes blobs — client has local copy in IndexedDB (Layer 0)
### Compromised device scenario
- Plaintext exposed on that device only
- Other devices are unaffected (no key sharing between devices)
- Passkey revocation invalidates future PRF derivations
---
## Key Rotation
### Current Approach
- Server-side at-rest keys derived from `ENCRYPTION_SECRET` + keyId
- `keyId` stored in community doc `meta.encryptionKeyId`
- Rotation: generate new keyId → re-encrypt all docs → update meta
### Future Approach (with EncryptID Layer 2)
- Client-side key delegation via EncryptID key hierarchy
- Server never has access to plaintext keys
- Rotation managed by space admin through EncryptID
---
## Data Flow Diagrams
### Normal Operation (Unencrypted Space)
```
Client A Server Client B
| | |
|-- sync(docId, data) ---->| |
| |-- sync(docId, data) ---->|
| |-- saveDoc(docId) ------->| disk
|<-- sync(docId, resp) ----| |
```
### Relay Mode (Encrypted Space)
```
Client A Server Client B
| | |
|-- sync(docId, data) ---->| |
| |-- sync(docId, data) ---->| (forwarded)
|-- relay-backup --------->| |
| |-- save .enc blob ------->| disk
```
### Backup Restore (New Device)
```
New Device Server Backup Store
| | |
|-- GET /api/backup/space->| |
|<-- manifest -------------| |
|-- GET /api/backup/doc -->|-- load blob ------------>|
|<-- encrypted blob -------|<-- blob bytes -----------|
| | |
| (client decrypts with passkey, writes to IndexedDB) |
```