feat(meeting-intelligence): add backend infrastructure for transcription and AI summaries

Add complete Meeting Intelligence System infrastructure: Backend Services: - PostgreSQL schema with pgvector for semantic search - Transcription service using whisper.cpp and resemblyzer for diarization - Meeting Intelligence API with FastAPI - Jibri configuration for recording API Endpoints: - /meetings - List, get, delete meetings - /meetings/{id}/transcript - Get transcripts with speaker attribution - /meetings/{id}/summary - Generate AI summaries via Ollama - /search - Full-text and semantic search - /meetings/{id}/export - Export as PDF, Markdown, JSON - /webhooks/recording-complete - Jibri callback Features: - Zero-cost local transcription (whisper.cpp CPU) - Speaker diarization (who said what) - AI-powered summaries with key points, action items, decisions - Vector embeddings for semantic search - Multi-format export Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 19:04:19 +00:00 · 2026-02-05 19:04:19 +00:00 · 4cb219db0f
parent f56986818b
commit 4cb219db0f
27 changed files with 4017 additions and 0 deletions
--- a/deploy/meeting-intelligence/.env.example
+++ b/deploy/meeting-intelligence/.env.example
@ -0,0 +1,17 @@
+# Meeting Intelligence System - Environment Variables
+# Copy this file to .env and update values
+
+# PostgreSQL
+POSTGRES_PASSWORD=your-secure-password-here
+
+# API Security
+API_SECRET_KEY=your-api-secret-key-here
+
+# Jibri XMPP Configuration
+XMPP_SERVER=meet.jeffemmett.com
+XMPP_DOMAIN=meet.jeffemmett.com
+JIBRI_XMPP_PASSWORD=jibri-xmpp-password
+JIBRI_RECORDER_PASSWORD=recorder-password
+
+# Ollama (uses host.docker.internal by default)
+# OLLAMA_URL=http://host.docker.internal:11434
--- a/deploy/meeting-intelligence/README.md
+++ b/deploy/meeting-intelligence/README.md
@ -0,0 +1,151 @@
+# Meeting Intelligence System
+
+A fully self-hosted, zero-cost meeting intelligence system for Jeffsi Meet that provides:
+- Automatic meeting recording via Jibri
+- Local transcription via whisper.cpp (CPU-only)
+- Speaker diarization (who said what)
+- AI-powered summaries via Ollama
+- Searchable meeting archive with dashboard
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                      Netcup RS 8000 (Backend)                       │
+│                                                                     │
+│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────┐ │
+│  │   Jibri     │───▶│  Whisper    │───▶│      AI Processor       │ │
+│  │  Recording  │    │ Transcriber │    │  (Ollama + Summarizer)  │ │
+│  │  Container  │    │  Service    │    │                         │ │
+│  └─────────────┘    └─────────────┘    └─────────────────────────┘ │
+│         │                  │                        │               │
+│         ▼                  ▼                        ▼               │
+│  ┌─────────────────────────────────────────────────────────────┐   │
+│  │                      PostgreSQL + pgvector                   │   │
+│  └─────────────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+## Components
+
+| Service | Port | Description |
+|---------|------|-------------|
+| PostgreSQL | 5432 | Database with pgvector for semantic search |
+| Redis | 6379 | Job queue for async processing |
+| Transcriber | 8001 | whisper.cpp + speaker diarization |
+| API | 8000 | REST API for meetings, transcripts, search |
+| Jibri | - | Recording service (joins meetings as hidden participant) |
+
+## Deployment
+
+### Prerequisites
+
+1. Docker and Docker Compose installed
+2. Ollama running on the host (for AI summaries)
+3. Jeffsi Meet configured with recording enabled
+
+### Setup
+
+1. Copy environment file:
+   ```bash
+   cp .env.example .env
+   ```
+
+2. Edit `.env` with your configuration:
+   ```bash
+   vim .env
+   ```
+
+3. Create storage directories:
+   ```bash
+   sudo mkdir -p /opt/meetings/{recordings,audio}
+   sudo chown -R 1000:1000 /opt/meetings
+   ```
+
+4. Start services:
+   ```bash
+   docker compose up -d
+   ```
+
+5. Check logs:
+   ```bash
+   docker compose logs -f
+   ```
+
+## API Endpoints
+
+Base URL: `https://meet.jeffemmett.com/api/intelligence`
+
+### Meetings
+- `GET /meetings` - List all meetings
+- `GET /meetings/{id}` - Get meeting details
+- `DELETE /meetings/{id}` - Delete meeting
+
+### Transcripts
+- `GET /meetings/{id}/transcript` - Get full transcript
+- `GET /meetings/{id}/transcript/text` - Get as plain text
+- `GET /meetings/{id}/speakers` - Get speaker statistics
+
+### Summaries
+- `GET /meetings/{id}/summary` - Get AI summary
+- `POST /meetings/{id}/summary` - Generate summary
+
+### Search
+- `POST /search` - Search transcripts (text + semantic)
+- `GET /search/suggest` - Get search suggestions
+
+### Export
+- `GET /meetings/{id}/export?format=markdown` - Export as Markdown
+- `GET /meetings/{id}/export?format=json` - Export as JSON
+- `GET /meetings/{id}/export?format=pdf` - Export as PDF
+
+### Webhooks
+- `POST /webhooks/recording-complete` - Jibri recording callback
+
+## Processing Pipeline
+
+1. **Recording** - Jibri joins meeting and records
+2. **Webhook** - Jibri calls `/webhooks/recording-complete`
+3. **Audio Extraction** - FFmpeg extracts audio from video
+4. **Transcription** - whisper.cpp transcribes audio
+5. **Diarization** - resemblyzer identifies speakers
+6. **Embedding** - Generate vector embeddings for search
+7. **Summary** - Ollama generates AI summary
+8. **Ready** - Meeting available in dashboard
+
+## Resource Usage
+
+| Service | CPU | RAM | Storage |
+|---------|-----|-----|---------|
+| Transcriber | 8 cores | 12GB | 5GB (models) |
+| API | 1 core | 2GB | - |
+| PostgreSQL | 2 cores | 4GB | ~50GB |
+| Jibri | 2 cores | 4GB | - |
+| Redis | 0.5 cores | 512MB | - |
+
+## Troubleshooting
+
+### Transcription is slow
+- Check CPU usage: `docker stats meeting-intelligence-transcriber`
+- Increase `WHISPER_THREADS` in docker-compose.yml
+- Consider using the `tiny` model for faster (less accurate) transcription
+
+### No summary generated
+- Check Ollama is running: `curl http://localhost:11434/api/tags`
+- Check logs: `docker compose logs api`
+- Verify model is available: `ollama list`
+
+### Recording not starting
+- Check Jibri logs: `docker compose logs jibri`
+- Verify XMPP credentials in `.env`
+- Check Prosody recorder virtual host configuration
+
+## Cost Analysis
+
+| Component | Monthly Cost |
+|-----------|-------------|
+| Jibri recording | $0 (local) |
+| Whisper transcription | $0 (local CPU) |
+| Ollama summarization | $0 (local) |
+| PostgreSQL | $0 (local) |
+| **Total** | **$0/month** |
--- a/deploy/meeting-intelligence/api/Dockerfile
+++ b/deploy/meeting-intelligence/api/Dockerfile
@ -0,0 +1,32 @@
+# Meeting Intelligence API
+# Provides REST API for meeting transcripts, summaries, and search
+
+FROM python:3.11-slim
+
+# Install dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /app
+
+# Install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application code
+COPY app/ ./app/
+
+# Create directories
+RUN mkdir -p /recordings /logs
+
+# Environment variables
+ENV PYTHONUNBUFFERED=1
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+
+# Run the service
+EXPOSE 8000
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/deploy/meeting-intelligence/api/app/init.py
+++ b/deploy/meeting-intelligence/api/app/init.py
@ -0,0 +1 @@
+# Meeting Intelligence API
--- a/deploy/meeting-intelligence/api/app/config.py
+++ b/deploy/meeting-intelligence/api/app/config.py
@ -0,0 +1,50 @@
+"""
+Configuration settings for the Meeting Intelligence API.
+"""
+
+from typing import List
+from pydantic_settings import BaseSettings
+
+
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables."""
+
+    # Database
+    postgres_url: str = "postgresql://meeting_intelligence:changeme@localhost:5432/meeting_intelligence"
+
+    # Redis
+    redis_url: str = "redis://localhost:6379"
+
+    # Ollama (for AI summaries)
+    ollama_url: str = "http://localhost:11434"
+    ollama_model: str = "llama3.2"
+
+    # File paths
+    recordings_path: str = "/recordings"
+
+    # Security
+    secret_key: str = "changeme"
+    api_key: str = ""  # Optional API key authentication
+
+    # CORS
+    cors_origins: List[str] = [
+        "https://meet.jeffemmett.com",
+        "http://localhost:8080",
+        "http://localhost:3000"
+    ]
+
+    # Embeddings model for semantic search
+    embedding_model: str = "all-MiniLM-L6-v2"
+
+    # Export settings
+    export_temp_dir: str = "/tmp/exports"
+
+    # Transcriber service URL
+    transcriber_url: str = "http://transcriber:8001"
+
+    class Config:
+        env_file = ".env"
+        env_file_encoding = "utf-8"
+
+
+settings = Settings()
--- a/deploy/meeting-intelligence/api/app/database.py
+++ b/deploy/meeting-intelligence/api/app/database.py
@ -0,0 +1,355 @@
+"""
+Database operations for the Meeting Intelligence API.
+"""
+
+import uuid
+from datetime import datetime
+from typing import Optional, List, Dict, Any
+
+import asyncpg
+import structlog
+
+log = structlog.get_logger()
+
+
+class Database:
+    """Database operations for Meeting Intelligence API."""
+
+    def __init__(self, connection_string: str):
+        self.connection_string = connection_string
+        self.pool: Optional[asyncpg.Pool] = None
+
+    async def connect(self):
+        """Establish database connection pool."""
+        log.info("Connecting to database...")
+        self.pool = await asyncpg.create_pool(
+            self.connection_string,
+            min_size=2,
+            max_size=20
+        )
+        log.info("Database connected")
+
+    async def disconnect(self):
+        """Close database connection pool."""
+        if self.pool:
+            await self.pool.close()
+            log.info("Database disconnected")
+
+    async def health_check(self):
+        """Check database connectivity."""
+        async with self.pool.acquire() as conn:
+            await conn.fetchval("SELECT 1")
+
+    # ==================== Meetings ====================
+
+    async def list_meetings(
+        self,
+        limit: int = 50,
+        offset: int = 0,
+        status: Optional[str] = None
+    ) -> List[Dict[str, Any]]:
+        """List meetings with pagination."""
+        async with self.pool.acquire() as conn:
+            if status:
+                rows = await conn.fetch("""
+                    SELECT id, conference_id, conference_name, title,
+                           started_at, ended_at, duration_seconds,
+                           status, created_at
+                    FROM meetings
+                    WHERE status = $1
+                    ORDER BY created_at DESC
+                    LIMIT $2 OFFSET $3
+                """, status, limit, offset)
+            else:
+                rows = await conn.fetch("""
+                    SELECT id, conference_id, conference_name, title,
+                           started_at, ended_at, duration_seconds,
+                           status, created_at
+                    FROM meetings
+                    ORDER BY created_at DESC
+                    LIMIT $1 OFFSET $2
+                """, limit, offset)
+
+            return [dict(row) for row in rows]
+
+    async def get_meeting(self, meeting_id: str) -> Optional[Dict[str, Any]]:
+        """Get meeting details."""
+        async with self.pool.acquire() as conn:
+            row = await conn.fetchrow("""
+                SELECT m.id, m.conference_id, m.conference_name, m.title,
+                       m.started_at, m.ended_at, m.duration_seconds,
+                       m.recording_path, m.audio_path, m.status,
+                       m.metadata, m.created_at,
+                       (SELECT COUNT(*) FROM transcripts WHERE meeting_id = m.id) as segment_count,
+                       (SELECT COUNT(*) FROM meeting_participants WHERE meeting_id = m.id) as participant_count,
+                       (SELECT id FROM summaries WHERE meeting_id = m.id LIMIT 1) as summary_id
+                FROM meetings m
+                WHERE m.id = $1::uuid
+            """, meeting_id)
+
+            if row:
+                return dict(row)
+            return None
+
+    async def create_meeting(
+        self,
+        conference_id: str,
+        conference_name: Optional[str] = None,
+        title: Optional[str] = None,
+        recording_path: Optional[str] = None,
+        started_at: Optional[datetime] = None,
+        metadata: Optional[dict] = None
+    ) -> str:
+        """Create a new meeting record."""
+        meeting_id = str(uuid.uuid4())
+
+        async with self.pool.acquire() as conn:
+            await conn.execute("""
+                INSERT INTO meetings (
+                    id, conference_id, conference_name, title,
+                    recording_path, started_at, status, metadata
+                )
+                VALUES ($1, $2, $3, $4, $5, $6, 'recording', $7)
+            """, meeting_id, conference_id, conference_name, title,
+               recording_path, started_at or datetime.utcnow(), metadata or {})
+
+        return meeting_id
+
+    async def update_meeting(
+        self,
+        meeting_id: str,
+        **kwargs
+    ):
+        """Update meeting fields."""
+        if not kwargs:
+            return
+
+        set_clauses = []
+        values = []
+        i = 1
+
+        for key, value in kwargs.items():
+            if key in ['status', 'title', 'ended_at', 'duration_seconds',
+                       'recording_path', 'audio_path', 'error_message']:
+                set_clauses.append(f"{key} = ${i}")
+                values.append(value)
+                i += 1
+
+        if not set_clauses:
+            return
+
+        values.append(meeting_id)
+
+        async with self.pool.acquire() as conn:
+            await conn.execute(f"""
+                UPDATE meetings
+                SET {', '.join(set_clauses)}, updated_at = NOW()
+                WHERE id = ${i}::uuid
+            """, *values)
+
+    # ==================== Transcripts ====================
+
+    async def get_transcript(
+        self,
+        meeting_id: str,
+        speaker_filter: Optional[str] = None
+    ) -> List[Dict[str, Any]]:
+        """Get transcript segments for a meeting."""
+        async with self.pool.acquire() as conn:
+            if speaker_filter:
+                rows = await conn.fetch("""
+                    SELECT id, segment_index, start_time, end_time,
+                           speaker_id, speaker_name, speaker_label,
+                           text, confidence, language
+                    FROM transcripts
+                    WHERE meeting_id = $1::uuid AND speaker_id = $2
+                    ORDER BY segment_index ASC
+                """, meeting_id, speaker_filter)
+            else:
+                rows = await conn.fetch("""
+                    SELECT id, segment_index, start_time, end_time,
+                           speaker_id, speaker_name, speaker_label,
+                           text, confidence, language
+                    FROM transcripts
+                    WHERE meeting_id = $1::uuid
+                    ORDER BY segment_index ASC
+                """, meeting_id)
+
+            return [dict(row) for row in rows]
+
+    async def get_speakers(self, meeting_id: str) -> List[Dict[str, Any]]:
+        """Get speaker statistics for a meeting."""
+        async with self.pool.acquire() as conn:
+            rows = await conn.fetch("""
+                SELECT speaker_id, speaker_label,
+                       COUNT(*) as segment_count,
+                       SUM(end_time - start_time) as speaking_time,
+                       SUM(LENGTH(text)) as character_count
+                FROM transcripts
+                WHERE meeting_id = $1::uuid AND speaker_id IS NOT NULL
+                GROUP BY speaker_id, speaker_label
+                ORDER BY speaking_time DESC
+            """, meeting_id)
+
+            return [dict(row) for row in rows]
+
+    # ==================== Summaries ====================
+
+    async def get_summary(self, meeting_id: str) -> Optional[Dict[str, Any]]:
+        """Get AI summary for a meeting."""
+        async with self.pool.acquire() as conn:
+            row = await conn.fetchrow("""
+                SELECT id, meeting_id, summary_text, key_points,
+                       action_items, decisions, topics, sentiment,
+                       model_used, generated_at
+                FROM summaries
+                WHERE meeting_id = $1::uuid
+                ORDER BY generated_at DESC
+                LIMIT 1
+            """, meeting_id)
+
+            if row:
+                return dict(row)
+            return None
+
+    async def save_summary(
+        self,
+        meeting_id: str,
+        summary_text: str,
+        key_points: List[str],
+        action_items: List[dict],
+        decisions: List[str],
+        topics: List[dict],
+        sentiment: str,
+        model_used: str,
+        prompt_tokens: int = 0,
+        completion_tokens: int = 0
+    ) -> int:
+        """Save AI-generated summary."""
+        async with self.pool.acquire() as conn:
+            row = await conn.fetchrow("""
+                INSERT INTO summaries (
+                    meeting_id, summary_text, key_points, action_items,
+                    decisions, topics, sentiment, model_used,
+                    prompt_tokens, completion_tokens
+                )
+                VALUES ($1::uuid, $2, $3, $4, $5, $6, $7, $8, $9, $10)
+                RETURNING id
+            """, meeting_id, summary_text, key_points, action_items,
+               decisions, topics, sentiment, model_used,
+               prompt_tokens, completion_tokens)
+
+            return row["id"]
+
+    # ==================== Search ====================
+
+    async def fulltext_search(
+        self,
+        query: str,
+        meeting_id: Optional[str] = None,
+        limit: int = 50
+    ) -> List[Dict[str, Any]]:
+        """Full-text search across transcripts."""
+        async with self.pool.acquire() as conn:
+            if meeting_id:
+                rows = await conn.fetch("""
+                    SELECT t.id, t.meeting_id, t.start_time, t.end_time,
+                           t.speaker_label, t.text, m.title as meeting_title,
+                           ts_rank(to_tsvector('english', t.text),
+                                   plainto_tsquery('english', $1)) as rank
+                    FROM transcripts t
+                    JOIN meetings m ON t.meeting_id = m.id
+                    WHERE t.meeting_id = $2::uuid
+                      AND to_tsvector('english', t.text) @@ plainto_tsquery('english', $1)
+                    ORDER BY rank DESC
+                    LIMIT $3
+                """, query, meeting_id, limit)
+            else:
+                rows = await conn.fetch("""
+                    SELECT t.id, t.meeting_id, t.start_time, t.end_time,
+                           t.speaker_label, t.text, m.title as meeting_title,
+                           ts_rank(to_tsvector('english', t.text),
+                                   plainto_tsquery('english', $1)) as rank
+                    FROM transcripts t
+                    JOIN meetings m ON t.meeting_id = m.id
+                    WHERE to_tsvector('english', t.text) @@ plainto_tsquery('english', $1)
+                    ORDER BY rank DESC
+                    LIMIT $2
+                """, query, limit)
+
+            return [dict(row) for row in rows]
+
+    async def semantic_search(
+        self,
+        embedding: List[float],
+        meeting_id: Optional[str] = None,
+        threshold: float = 0.7,
+        limit: int = 20
+    ) -> List[Dict[str, Any]]:
+        """Semantic search using vector embeddings."""
+        async with self.pool.acquire() as conn:
+            embedding_str = f"[{','.join(map(str, embedding))}]"
+
+            if meeting_id:
+                rows = await conn.fetch("""
+                    SELECT te.transcript_id, te.meeting_id, te.chunk_text,
+                           t.start_time, t.speaker_label, m.title as meeting_title,
+                           1 - (te.embedding <=> $1::vector) as similarity
+                    FROM transcript_embeddings te
+                    JOIN transcripts t ON te.transcript_id = t.id
+                    JOIN meetings m ON te.meeting_id = m.id
+                    WHERE te.meeting_id = $2::uuid
+                      AND 1 - (te.embedding <=> $1::vector) > $3
+                    ORDER BY te.embedding <=> $1::vector
+                    LIMIT $4
+                """, embedding_str, meeting_id, threshold, limit)
+            else:
+                rows = await conn.fetch("""
+                    SELECT te.transcript_id, te.meeting_id, te.chunk_text,
+                           t.start_time, t.speaker_label, m.title as meeting_title,
+                           1 - (te.embedding <=> $1::vector) as similarity
+                    FROM transcript_embeddings te
+                    JOIN transcripts t ON te.transcript_id = t.id
+                    JOIN meetings m ON te.meeting_id = m.id
+                    WHERE 1 - (te.embedding <=> $1::vector) > $2
+                    ORDER BY te.embedding <=> $1::vector
+                    LIMIT $3
+                """, embedding_str, threshold, limit)
+
+            return [dict(row) for row in rows]
+
+    # ==================== Webhooks ====================
+
+    async def save_webhook_event(
+        self,
+        event_type: str,
+        payload: dict
+    ) -> int:
+        """Save a webhook event for processing."""
+        async with self.pool.acquire() as conn:
+            row = await conn.fetchrow("""
+                INSERT INTO webhook_events (event_type, payload)
+                VALUES ($1, $2)
+                RETURNING id
+            """, event_type, payload)
+
+            return row["id"]
+
+    # ==================== Jobs ====================
+
+    async def create_job(
+        self,
+        meeting_id: str,
+        job_type: str,
+        priority: int = 5,
+        result: Optional[dict] = None
+    ) -> int:
+        """Create a processing job."""
+        async with self.pool.acquire() as conn:
+            row = await conn.fetchrow("""
+                INSERT INTO processing_jobs (meeting_id, job_type, priority, result)
+                VALUES ($1::uuid, $2, $3, $4)
+                RETURNING id
+            """, meeting_id, job_type, priority, result or {})
+
+            return row["id"]
--- a/deploy/meeting-intelligence/api/app/main.py
+++ b/deploy/meeting-intelligence/api/app/main.py
@ -0,0 +1,113 @@
+"""
+Meeting Intelligence API
+
+Provides REST API for:
+- Meeting management
+- Transcript retrieval
+- AI-powered summaries
+- Semantic search
+- Export functionality
+"""
+
+import os
+from contextlib import asynccontextmanager
+from typing import Optional
+
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+
+from .config import settings
+from .database import Database
+from .routes import meetings, transcripts, summaries, search, webhooks, export
+
+import structlog
+
+log = structlog.get_logger()
+
+
+# Application state
+class AppState:
+    db: Optional[Database] = None
+
+
+state = AppState()
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Application startup and shutdown."""
+    log.info("Starting Meeting Intelligence API...")
+
+    # Initialize database
+    state.db = Database(settings.postgres_url)
+    await state.db.connect()
+
+    # Make database available to routes
+    app.state.db = state.db
+
+    log.info("Meeting Intelligence API started successfully")
+
+    yield
+
+    # Shutdown
+    log.info("Shutting down Meeting Intelligence API...")
+    if state.db:
+        await state.db.disconnect()
+
+    log.info("Meeting Intelligence API stopped")
+
+
+app = FastAPI(
+    title="Meeting Intelligence API",
+    description="API for meeting transcripts, summaries, and search",
+    version="1.0.0",
+    lifespan=lifespan,
+    docs_url="/docs",
+    redoc_url="/redoc"
+)
+
+# CORS configuration
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=settings.cors_origins,
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# Include routers
+app.include_router(meetings.router, prefix="/meetings", tags=["Meetings"])
+app.include_router(transcripts.router, prefix="/meetings", tags=["Transcripts"])
+app.include_router(summaries.router, prefix="/meetings", tags=["Summaries"])
+app.include_router(search.router, prefix="/search", tags=["Search"])
+app.include_router(webhooks.router, prefix="/webhooks", tags=["Webhooks"])
+app.include_router(export.router, prefix="/meetings", tags=["Export"])
+
+
+@app.get("/health")
+async def health_check():
+    """Health check endpoint."""
+    db_ok = False
+
+    try:
+        if state.db:
+            await state.db.health_check()
+            db_ok = True
+    except Exception as e:
+        log.error("Database health check failed", error=str(e))
+
+    return {
+        "status": "healthy" if db_ok else "unhealthy",
+        "database": db_ok,
+        "version": "1.0.0"
+    }
+
+
+@app.get("/")
+async def root():
+    """Root endpoint."""
+    return {
+        "service": "Meeting Intelligence API",
+        "version": "1.0.0",
+        "docs": "/docs"
+    }
--- a/deploy/meeting-intelligence/api/app/routes/init.py
+++ b/deploy/meeting-intelligence/api/app/routes/init.py
@ -0,0 +1,2 @@
+# API Routes
+from . import meetings, transcripts, summaries, search, webhooks, export
--- a/deploy/meeting-intelligence/api/app/routes/export.py
+++ b/deploy/meeting-intelligence/api/app/routes/export.py
@ -0,0 +1,319 @@
+"""
+Export routes for Meeting Intelligence.
+
+Supports exporting meetings as PDF, Markdown, and JSON.
+"""
+
+import io
+import json
+import os
+from datetime import datetime
+from typing import Optional
+
+from fastapi import APIRouter, HTTPException, Request, Response
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel
+
+import structlog
+
+log = structlog.get_logger()
+
+router = APIRouter()
+
+
+class ExportRequest(BaseModel):
+    format: str = "markdown"  # "pdf", "markdown", "json"
+    include_transcript: bool = True
+    include_summary: bool = True
+
+
+@router.get("/{meeting_id}/export")
+async def export_meeting(
+    request: Request,
+    meeting_id: str,
+    format: str = "markdown",
+    include_transcript: bool = True,
+    include_summary: bool = True
+):
+    """Export meeting data in various formats."""
+    db = request.app.state.db
+
+    # Get meeting data
+    meeting = await db.get_meeting(meeting_id)
+    if not meeting:
+        raise HTTPException(status_code=404, detail="Meeting not found")
+
+    # Get transcript if requested
+    transcript = None
+    if include_transcript:
+        transcript = await db.get_transcript(meeting_id)
+
+    # Get summary if requested
+    summary = None
+    if include_summary:
+        summary = await db.get_summary(meeting_id)
+
+    # Export based on format
+    if format == "json":
+        return _export_json(meeting, transcript, summary)
+    elif format == "markdown":
+        return _export_markdown(meeting, transcript, summary)
+    elif format == "pdf":
+        return await _export_pdf(meeting, transcript, summary)
+    else:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Unsupported format: {format}. Use: json, markdown, pdf"
+        )
+
+
+def _export_json(meeting: dict, transcript: list, summary: dict) -> Response:
+    """Export as JSON."""
+    data = {
+        "meeting": {
+            "id": str(meeting["id"]),
+            "conference_id": meeting["conference_id"],
+            "title": meeting.get("title"),
+            "started_at": meeting["started_at"].isoformat() if meeting.get("started_at") else None,
+            "ended_at": meeting["ended_at"].isoformat() if meeting.get("ended_at") else None,
+            "duration_seconds": meeting.get("duration_seconds"),
+            "status": meeting["status"]
+        },
+        "transcript": [
+            {
+                "start_time": s["start_time"],
+                "end_time": s["end_time"],
+                "speaker": s.get("speaker_label"),
+                "text": s["text"]
+            }
+            for s in (transcript or [])
+        ] if transcript else None,
+        "summary": {
+            "text": summary["summary_text"],
+            "key_points": summary["key_points"],
+            "action_items": summary["action_items"],
+            "decisions": summary["decisions"],
+            "topics": summary["topics"],
+            "sentiment": summary.get("sentiment")
+        } if summary else None,
+        "exported_at": datetime.utcnow().isoformat()
+    }
+
+    filename = f"meeting-{meeting['conference_id']}-{datetime.utcnow().strftime('%Y%m%d')}.json"
+
+    return Response(
+        content=json.dumps(data, indent=2),
+        media_type="application/json",
+        headers={
+            "Content-Disposition": f'attachment; filename="{filename}"'
+        }
+    )
+
+
+def _export_markdown(meeting: dict, transcript: list, summary: dict) -> Response:
+    """Export as Markdown."""
+    lines = []
+
+    # Header
+    title = meeting.get("title") or f"Meeting: {meeting['conference_id']}"
+    lines.append(f"# {title}")
+    lines.append("")
+
+    # Metadata
+    lines.append("## Meeting Details")
+    lines.append("")
+    lines.append(f"- **Conference ID:** {meeting['conference_id']}")
+    if meeting.get("started_at"):
+        lines.append(f"- **Date:** {meeting['started_at'].strftime('%Y-%m-%d %H:%M UTC')}")
+    if meeting.get("duration_seconds"):
+        minutes = meeting["duration_seconds"] // 60
+        lines.append(f"- **Duration:** {minutes} minutes")
+    lines.append(f"- **Status:** {meeting['status']}")
+    lines.append("")
+
+    # Summary
+    if summary:
+        lines.append("## Summary")
+        lines.append("")
+        lines.append(summary["summary_text"])
+        lines.append("")
+
+        # Key Points
+        if summary.get("key_points"):
+            lines.append("### Key Points")
+            lines.append("")
+            for point in summary["key_points"]:
+                lines.append(f"- {point}")
+            lines.append("")
+
+        # Action Items
+        if summary.get("action_items"):
+            lines.append("### Action Items")
+            lines.append("")
+            for item in summary["action_items"]:
+                task = item.get("task", item) if isinstance(item, dict) else item
+                assignee = item.get("assignee", "") if isinstance(item, dict) else ""
+                checkbox = "[ ]"
+                if assignee:
+                    lines.append(f"- {checkbox} {task} *(Assigned: {assignee})*")
+                else:
+                    lines.append(f"- {checkbox} {task}")
+            lines.append("")
+
+        # Decisions
+        if summary.get("decisions"):
+            lines.append("### Decisions")
+            lines.append("")
+            for decision in summary["decisions"]:
+                lines.append(f"- {decision}")
+            lines.append("")
+
+    # Transcript
+    if transcript:
+        lines.append("## Transcript")
+        lines.append("")
+
+        current_speaker = None
+        for segment in transcript:
+            speaker = segment.get("speaker_label") or "Speaker"
+            time_str = _format_time(segment["start_time"])
+
+            if speaker != current_speaker:
+                lines.append("")
+                lines.append(f"**{speaker}** *({time_str})*")
+                current_speaker = speaker
+
+            lines.append(f"> {segment['text']}")
+
+        lines.append("")
+
+    # Footer
+    lines.append("---")
+    lines.append(f"*Exported on {datetime.utcnow().strftime('%Y-%m-%d %H:%M UTC')} by Meeting Intelligence*")
+
+    content = "\n".join(lines)
+    filename = f"meeting-{meeting['conference_id']}-{datetime.utcnow().strftime('%Y%m%d')}.md"
+
+    return Response(
+        content=content,
+        media_type="text/markdown",
+        headers={
+            "Content-Disposition": f'attachment; filename="{filename}"'
+        }
+    )
+
+
+async def _export_pdf(meeting: dict, transcript: list, summary: dict) -> StreamingResponse:
+    """Export as PDF using reportlab."""
+    try:
+        from reportlab.lib.pagesizes import letter
+        from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+        from reportlab.lib.units import inch
+        from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, ListFlowable, ListItem
+    except ImportError:
+        raise HTTPException(
+            status_code=501,
+            detail="PDF export requires reportlab. Use markdown or json format."
+        )
+
+    buffer = io.BytesIO()
+
+    # Create PDF document
+    doc = SimpleDocTemplate(
+        buffer,
+        pagesize=letter,
+        rightMargin=72,
+        leftMargin=72,
+        topMargin=72,
+        bottomMargin=72
+    )
+
+    styles = getSampleStyleSheet()
+    story = []
+
+    # Title
+    title = meeting.get("title") or f"Meeting: {meeting['conference_id']}"
+    story.append(Paragraph(title, styles['Title']))
+    story.append(Spacer(1, 12))
+
+    # Metadata
+    story.append(Paragraph("Meeting Details", styles['Heading2']))
+    if meeting.get("started_at"):
+        story.append(Paragraph(
+            f"Date: {meeting['started_at'].strftime('%Y-%m-%d %H:%M UTC')}",
+            styles['Normal']
+        ))
+    if meeting.get("duration_seconds"):
+        minutes = meeting["duration_seconds"] // 60
+        story.append(Paragraph(f"Duration: {minutes} minutes", styles['Normal']))
+    story.append(Spacer(1, 12))
+
+    # Summary
+    if summary:
+        story.append(Paragraph("Summary", styles['Heading2']))
+        story.append(Paragraph(summary["summary_text"], styles['Normal']))
+        story.append(Spacer(1, 12))
+
+        if summary.get("key_points"):
+            story.append(Paragraph("Key Points", styles['Heading3']))
+            for point in summary["key_points"]:
+                story.append(Paragraph(f"• {point}", styles['Normal']))
+            story.append(Spacer(1, 12))
+
+        if summary.get("action_items"):
+            story.append(Paragraph("Action Items", styles['Heading3']))
+            for item in summary["action_items"]:
+                task = item.get("task", item) if isinstance(item, dict) else item
+                story.append(Paragraph(f"☐ {task}", styles['Normal']))
+            story.append(Spacer(1, 12))
+
+    # Transcript (abbreviated for PDF)
+    if transcript:
+        story.append(Paragraph("Transcript", styles['Heading2']))
+        current_speaker = None
+
+        for segment in transcript[:100]:  # Limit segments for PDF
+            speaker = segment.get("speaker_label") or "Speaker"
+
+            if speaker != current_speaker:
+                story.append(Spacer(1, 6))
+                story.append(Paragraph(
+                    f"<b>{speaker}</b> ({_format_time(segment['start_time'])})",
+                    styles['Normal']
+                ))
+                current_speaker = speaker
+
+            story.append(Paragraph(segment['text'], styles['Normal']))
+
+        if len(transcript) > 100:
+            story.append(Spacer(1, 12))
+            story.append(Paragraph(
+                f"[... {len(transcript) - 100} more segments not shown in PDF]",
+                styles['Normal']
+            ))
+
+    # Build PDF
+    doc.build(story)
+    buffer.seek(0)
+
+    filename = f"meeting-{meeting['conference_id']}-{datetime.utcnow().strftime('%Y%m%d')}.pdf"
+
+    return StreamingResponse(
+        buffer,
+        media_type="application/pdf",
+        headers={
+            "Content-Disposition": f'attachment; filename="{filename}"'
+        }
+    )
+
+
+def _format_time(seconds: float) -> str:
+    """Format seconds as HH:MM:SS or MM:SS."""
+    total_seconds = int(seconds)
+    hours = total_seconds // 3600
+    minutes = (total_seconds % 3600) // 60
+    secs = total_seconds % 60
+
+    if hours > 0:
+        return f"{hours}:{minutes:02d}:{secs:02d}"
+    return f"{minutes}:{secs:02d}"
--- a/deploy/meeting-intelligence/api/app/routes/meetings.py
+++ b/deploy/meeting-intelligence/api/app/routes/meetings.py
@ -0,0 +1,112 @@
+"""
+Meeting management routes.
+"""
+
+from typing import Optional, List
+
+from fastapi import APIRouter, HTTPException, Request, Query
+from pydantic import BaseModel
+
+import structlog
+
+log = structlog.get_logger()
+
+router = APIRouter()
+
+
+class MeetingResponse(BaseModel):
+    id: str
+    conference_id: str
+    conference_name: Optional[str]
+    title: Optional[str]
+    started_at: Optional[str]
+    ended_at: Optional[str]
+    duration_seconds: Optional[int]
+    status: str
+    created_at: str
+    segment_count: Optional[int] = None
+    participant_count: Optional[int] = None
+    has_summary: Optional[bool] = None
+
+
+class MeetingListResponse(BaseModel):
+    meetings: List[MeetingResponse]
+    total: int
+    limit: int
+    offset: int
+
+
+@router.get("", response_model=MeetingListResponse)
+async def list_meetings(
+    request: Request,
+    limit: int = Query(default=50, le=100),
+    offset: int = Query(default=0, ge=0),
+    status: Optional[str] = Query(default=None)
+):
+    """List all meetings with pagination."""
+    db = request.app.state.db
+
+    meetings = await db.list_meetings(limit=limit, offset=offset, status=status)
+
+    return MeetingListResponse(
+        meetings=[
+            MeetingResponse(
+                id=str(m["id"]),
+                conference_id=m["conference_id"],
+                conference_name=m.get("conference_name"),
+                title=m.get("title"),
+                started_at=m["started_at"].isoformat() if m.get("started_at") else None,
+                ended_at=m["ended_at"].isoformat() if m.get("ended_at") else None,
+                duration_seconds=m.get("duration_seconds"),
+                status=m["status"],
+                created_at=m["created_at"].isoformat()
+            )
+            for m in meetings
+        ],
+        total=len(meetings),  # TODO: Add total count query
+        limit=limit,
+        offset=offset
+    )
+
+
+@router.get("/{meeting_id}", response_model=MeetingResponse)
+async def get_meeting(request: Request, meeting_id: str):
+    """Get meeting details."""
+    db = request.app.state.db
+
+    meeting = await db.get_meeting(meeting_id)
+
+    if not meeting:
+        raise HTTPException(status_code=404, detail="Meeting not found")
+
+    return MeetingResponse(
+        id=str(meeting["id"]),
+        conference_id=meeting["conference_id"],
+        conference_name=meeting.get("conference_name"),
+        title=meeting.get("title"),
+        started_at=meeting["started_at"].isoformat() if meeting.get("started_at") else None,
+        ended_at=meeting["ended_at"].isoformat() if meeting.get("ended_at") else None,
+        duration_seconds=meeting.get("duration_seconds"),
+        status=meeting["status"],
+        created_at=meeting["created_at"].isoformat(),
+        segment_count=meeting.get("segment_count"),
+        participant_count=meeting.get("participant_count"),
+        has_summary=meeting.get("summary_id") is not None
+    )
+
+
+@router.delete("/{meeting_id}")
+async def delete_meeting(request: Request, meeting_id: str):
+    """Delete a meeting and all associated data."""
+    db = request.app.state.db
+
+    meeting = await db.get_meeting(meeting_id)
+
+    if not meeting:
+        raise HTTPException(status_code=404, detail="Meeting not found")
+
+    # TODO: Implement cascade delete
+    # For now, just mark as deleted
+    await db.update_meeting(meeting_id, status="deleted")
+
+    return {"status": "deleted", "meeting_id": meeting_id}
--- a/deploy/meeting-intelligence/api/app/routes/search.py
+++ b/deploy/meeting-intelligence/api/app/routes/search.py
@ -0,0 +1,173 @@
+"""
+Search routes for Meeting Intelligence.
+"""
+
+from typing import Optional, List
+
+from fastapi import APIRouter, HTTPException, Request, Query
+from pydantic import BaseModel
+from sentence_transformers import SentenceTransformer
+
+from ..config import settings
+
+import structlog
+
+log = structlog.get_logger()
+
+router = APIRouter()
+
+# Lazy-load embedding model
+_embedding_model = None
+
+
+def get_embedding_model():
+    """Get or initialize the embedding model."""
+    global _embedding_model
+    if _embedding_model is None:
+        log.info("Loading embedding model...", model=settings.embedding_model)
+        _embedding_model = SentenceTransformer(settings.embedding_model)
+        log.info("Embedding model loaded")
+    return _embedding_model
+
+
+class SearchResult(BaseModel):
+    meeting_id: str
+    meeting_title: Optional[str]
+    text: str
+    start_time: Optional[float]
+    speaker_label: Optional[str]
+    score: float
+    search_type: str
+
+
+class SearchResponse(BaseModel):
+    query: str
+    results: List[SearchResult]
+    total: int
+    search_type: str
+
+
+class SearchRequest(BaseModel):
+    query: str
+    meeting_id: Optional[str] = None
+    search_type: str = "combined"  # "text", "semantic", "combined"
+    limit: int = 20
+
+
+@router.post("", response_model=SearchResponse)
+async def search_transcripts(request: Request, body: SearchRequest):
+    """Search across meeting transcripts.
+
+    Search types:
+    - text: Full-text search using PostgreSQL ts_vector
+    - semantic: Semantic search using vector embeddings
+    - combined: Both text and semantic search, merged results
+    """
+    db = request.app.state.db
+
+    if not body.query or len(body.query.strip()) < 2:
+        raise HTTPException(
+            status_code=400,
+            detail="Query must be at least 2 characters"
+        )
+
+    results = []
+
+    # Full-text search
+    if body.search_type in ["text", "combined"]:
+        text_results = await db.fulltext_search(
+            query=body.query,
+            meeting_id=body.meeting_id,
+            limit=body.limit
+        )
+
+        for r in text_results:
+            results.append(SearchResult(
+                meeting_id=str(r["meeting_id"]),
+                meeting_title=r.get("meeting_title"),
+                text=r["text"],
+                start_time=r.get("start_time"),
+                speaker_label=r.get("speaker_label"),
+                score=float(r["rank"]),
+                search_type="text"
+            ))
+
+    # Semantic search
+    if body.search_type in ["semantic", "combined"]:
+        try:
+            model = get_embedding_model()
+            query_embedding = model.encode(body.query).tolist()
+
+            semantic_results = await db.semantic_search(
+                embedding=query_embedding,
+                meeting_id=body.meeting_id,
+                threshold=0.6,
+                limit=body.limit
+            )
+
+            for r in semantic_results:
+                results.append(SearchResult(
+                    meeting_id=str(r["meeting_id"]),
+                    meeting_title=r.get("meeting_title"),
+                    text=r["chunk_text"],
+                    start_time=r.get("start_time"),
+                    speaker_label=r.get("speaker_label"),
+                    score=float(r["similarity"]),
+                    search_type="semantic"
+                ))
+
+        except Exception as e:
+            log.error("Semantic search failed", error=str(e))
+            if body.search_type == "semantic":
+                raise HTTPException(
+                    status_code=500,
+                    detail=f"Semantic search failed: {str(e)}"
+                )
+
+    # Deduplicate and sort by score
+    seen = set()
+    unique_results = []
+    for r in sorted(results, key=lambda x: x.score, reverse=True):
+        key = (r.meeting_id, r.text[:100])
+        if key not in seen:
+            seen.add(key)
+            unique_results.append(r)
+
+    return SearchResponse(
+        query=body.query,
+        results=unique_results[:body.limit],
+        total=len(unique_results),
+        search_type=body.search_type
+    )
+
+
+@router.get("/suggest")
+async def search_suggestions(
+    request: Request,
+    q: str = Query(..., min_length=2)
+):
+    """Get search suggestions based on partial query."""
+    db = request.app.state.db
+
+    # Simple prefix search on common terms
+    results = await db.fulltext_search(query=q, limit=5)
+
+    # Extract unique phrases
+    suggestions = []
+    for r in results:
+        # Get surrounding context
+        text = r["text"]
+        words = text.split()
+
+        # Find matching words and get context
+        for i, word in enumerate(words):
+            if q.lower() in word.lower():
+                start = max(0, i - 2)
+                end = min(len(words), i + 3)
+                phrase = " ".join(words[start:end])
+                if phrase not in suggestions:
+                    suggestions.append(phrase)
+                    if len(suggestions) >= 5:
+                        break
+
+    return {"suggestions": suggestions}
--- a/deploy/meeting-intelligence/api/app/routes/summaries.py
+++ b/deploy/meeting-intelligence/api/app/routes/summaries.py
@ -0,0 +1,251 @@
+"""
+AI Summary routes.
+"""
+
+import json
+from typing import Optional, List
+
+import httpx
+from fastapi import APIRouter, HTTPException, Request, BackgroundTasks
+from pydantic import BaseModel
+
+from ..config import settings
+
+import structlog
+
+log = structlog.get_logger()
+
+router = APIRouter()
+
+
+class ActionItem(BaseModel):
+    task: str
+    assignee: Optional[str] = None
+    due_date: Optional[str] = None
+    completed: bool = False
+
+
+class Topic(BaseModel):
+    topic: str
+    duration_seconds: Optional[float] = None
+    relevance_score: Optional[float] = None
+
+
+class SummaryResponse(BaseModel):
+    meeting_id: str
+    summary_text: str
+    key_points: List[str]
+    action_items: List[ActionItem]
+    decisions: List[str]
+    topics: List[Topic]
+    sentiment: Optional[str]
+    model_used: str
+    generated_at: str
+
+
+class GenerateSummaryRequest(BaseModel):
+    force_regenerate: bool = False
+
+
+# Summarization prompt template
+SUMMARY_PROMPT = """You are analyzing a meeting transcript. Your task is to extract key information and provide a structured summary.
+
+## Meeting Transcript:
+{transcript}
+
+## Instructions:
+Analyze the transcript and extract the following information. Be concise and accurate.
+
+Respond ONLY with a valid JSON object in this exact format (no markdown, no extra text):
+{{
+    "summary": "A 2-3 sentence overview of what was discussed in the meeting",
+    "key_points": ["Point 1", "Point 2", "Point 3"],
+    "action_items": [
+        {{"task": "Description of task", "assignee": "Person name or null", "due_date": "Date or null"}}
+    ],
+    "decisions": ["Decision 1", "Decision 2"],
+    "topics": [
+        {{"topic": "Topic name", "relevance_score": 0.9}}
+    ],
+    "sentiment": "positive" or "neutral" or "negative" or "mixed"
+}}
+
+Remember:
+- key_points: 3-5 most important points discussed
+- action_items: Tasks that need to be done, with assignees if mentioned
+- decisions: Any decisions or conclusions reached
+- topics: Main themes discussed with relevance scores (0-1)
+- sentiment: Overall tone of the meeting
+"""
+
+
+@router.get("/{meeting_id}/summary", response_model=SummaryResponse)
+async def get_summary(request: Request, meeting_id: str):
+    """Get AI-generated summary for a meeting."""
+    db = request.app.state.db
+
+    # Verify meeting exists
+    meeting = await db.get_meeting(meeting_id)
+    if not meeting:
+        raise HTTPException(status_code=404, detail="Meeting not found")
+
+    summary = await db.get_summary(meeting_id)
+
+    if not summary:
+        raise HTTPException(
+            status_code=404,
+            detail="No summary available. Use POST to generate one."
+        )
+
+    return SummaryResponse(
+        meeting_id=meeting_id,
+        summary_text=summary["summary_text"],
+        key_points=summary["key_points"] or [],
+        action_items=[
+            ActionItem(**item) for item in (summary["action_items"] or [])
+        ],
+        decisions=summary["decisions"] or [],
+        topics=[
+            Topic(**topic) for topic in (summary["topics"] or [])
+        ],
+        sentiment=summary.get("sentiment"),
+        model_used=summary["model_used"],
+        generated_at=summary["generated_at"].isoformat()
+    )
+
+
+@router.post("/{meeting_id}/summary", response_model=SummaryResponse)
+async def generate_summary(
+    request: Request,
+    meeting_id: str,
+    body: GenerateSummaryRequest,
+    background_tasks: BackgroundTasks
+):
+    """Generate AI summary for a meeting."""
+    db = request.app.state.db
+
+    # Verify meeting exists
+    meeting = await db.get_meeting(meeting_id)
+    if not meeting:
+        raise HTTPException(status_code=404, detail="Meeting not found")
+
+    # Check if summary already exists
+    if not body.force_regenerate:
+        existing = await db.get_summary(meeting_id)
+        if existing:
+            raise HTTPException(
+                status_code=409,
+                detail="Summary already exists. Set force_regenerate=true to regenerate."
+            )
+
+    # Get transcript
+    segments = await db.get_transcript(meeting_id)
+    if not segments:
+        raise HTTPException(
+            status_code=400,
+            detail="No transcript available for summarization"
+        )
+
+    # Format transcript for LLM
+    transcript_text = _format_transcript(segments)
+
+    # Generate summary using Ollama
+    summary_data = await _generate_summary_with_ollama(transcript_text)
+
+    # Save summary
+    await db.save_summary(
+        meeting_id=meeting_id,
+        summary_text=summary_data["summary"],
+        key_points=summary_data["key_points"],
+        action_items=summary_data["action_items"],
+        decisions=summary_data["decisions"],
+        topics=summary_data["topics"],
+        sentiment=summary_data["sentiment"],
+        model_used=settings.ollama_model
+    )
+
+    # Update meeting status
+    await db.update_meeting(meeting_id, status="ready")
+
+    # Get the saved summary
+    summary = await db.get_summary(meeting_id)
+
+    return SummaryResponse(
+        meeting_id=meeting_id,
+        summary_text=summary["summary_text"],
+        key_points=summary["key_points"] or [],
+        action_items=[
+            ActionItem(**item) for item in (summary["action_items"] or [])
+        ],
+        decisions=summary["decisions"] or [],
+        topics=[
+            Topic(**topic) for topic in (summary["topics"] or [])
+        ],
+        sentiment=summary.get("sentiment"),
+        model_used=summary["model_used"],
+        generated_at=summary["generated_at"].isoformat()
+    )
+
+
+def _format_transcript(segments: list) -> str:
+    """Format transcript segments for LLM processing."""
+    lines = []
+    current_speaker = None
+
+    for s in segments:
+        speaker = s.get("speaker_label") or "Speaker"
+
+        if speaker != current_speaker:
+            lines.append(f"\n[{speaker}]")
+            current_speaker = speaker
+
+        lines.append(s["text"])
+
+    return "\n".join(lines)
+
+
+async def _generate_summary_with_ollama(transcript: str) -> dict:
+    """Generate summary using Ollama."""
+    prompt = SUMMARY_PROMPT.format(transcript=transcript[:15000])  # Limit context
+
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        try:
+            response = await client.post(
+                f"{settings.ollama_url}/api/generate",
+                json={
+                    "model": settings.ollama_model,
+                    "prompt": prompt,
+                    "stream": False,
+                    "format": "json"
+                }
+            )
+            response.raise_for_status()
+
+            result = response.json()
+            response_text = result.get("response", "")
+
+            # Parse JSON from response
+            summary_data = json.loads(response_text)
+
+            # Validate required fields
+            return {
+                "summary": summary_data.get("summary", "No summary generated"),
+                "key_points": summary_data.get("key_points", []),
+                "action_items": summary_data.get("action_items", []),
+                "decisions": summary_data.get("decisions", []),
+                "topics": summary_data.get("topics", []),
+                "sentiment": summary_data.get("sentiment", "neutral")
+            }
+
+        except httpx.HTTPError as e:
+            log.error("Ollama request failed", error=str(e))
+            raise HTTPException(
+                status_code=503,
+                detail=f"AI service unavailable: {str(e)}"
+            )
+        except json.JSONDecodeError as e:
+            log.error("Failed to parse Ollama response", error=str(e))
+            raise HTTPException(
+                status_code=500,
+                detail="Failed to parse AI response"
+            )
--- a/deploy/meeting-intelligence/api/app/routes/transcripts.py
+++ b/deploy/meeting-intelligence/api/app/routes/transcripts.py
@ -0,0 +1,161 @@
+"""
+Transcript routes.
+"""
+
+from typing import Optional, List
+
+from fastapi import APIRouter, HTTPException, Request, Query
+from pydantic import BaseModel
+
+import structlog
+
+log = structlog.get_logger()
+
+router = APIRouter()
+
+
+class TranscriptSegment(BaseModel):
+    id: int
+    segment_index: int
+    start_time: float
+    end_time: float
+    speaker_id: Optional[str]
+    speaker_name: Optional[str]
+    speaker_label: Optional[str]
+    text: str
+    confidence: Optional[float]
+    language: Optional[str]
+
+
+class TranscriptResponse(BaseModel):
+    meeting_id: str
+    segments: List[TranscriptSegment]
+    total_segments: int
+    duration: Optional[float]
+
+
+class SpeakerStats(BaseModel):
+    speaker_id: str
+    speaker_label: Optional[str]
+    segment_count: int
+    speaking_time: float
+    character_count: int
+
+
+class SpeakersResponse(BaseModel):
+    meeting_id: str
+    speakers: List[SpeakerStats]
+
+
+@router.get("/{meeting_id}/transcript", response_model=TranscriptResponse)
+async def get_transcript(
+    request: Request,
+    meeting_id: str,
+    speaker: Optional[str] = Query(default=None, description="Filter by speaker ID")
+):
+    """Get full transcript for a meeting."""
+    db = request.app.state.db
+
+    # Verify meeting exists
+    meeting = await db.get_meeting(meeting_id)
+    if not meeting:
+        raise HTTPException(status_code=404, detail="Meeting not found")
+
+    segments = await db.get_transcript(meeting_id, speaker_filter=speaker)
+
+    if not segments:
+        raise HTTPException(
+            status_code=404,
+            detail="No transcript available for this meeting"
+        )
+
+    # Calculate duration from last segment
+    duration = segments[-1]["end_time"] if segments else None
+
+    return TranscriptResponse(
+        meeting_id=meeting_id,
+        segments=[
+            TranscriptSegment(
+                id=s["id"],
+                segment_index=s["segment_index"],
+                start_time=s["start_time"],
+                end_time=s["end_time"],
+                speaker_id=s.get("speaker_id"),
+                speaker_name=s.get("speaker_name"),
+                speaker_label=s.get("speaker_label"),
+                text=s["text"],
+                confidence=s.get("confidence"),
+                language=s.get("language")
+            )
+            for s in segments
+        ],
+        total_segments=len(segments),
+        duration=duration
+    )
+
+
+@router.get("/{meeting_id}/speakers", response_model=SpeakersResponse)
+async def get_speakers(request: Request, meeting_id: str):
+    """Get speaker statistics for a meeting."""
+    db = request.app.state.db
+
+    # Verify meeting exists
+    meeting = await db.get_meeting(meeting_id)
+    if not meeting:
+        raise HTTPException(status_code=404, detail="Meeting not found")
+
+    speakers = await db.get_speakers(meeting_id)
+
+    return SpeakersResponse(
+        meeting_id=meeting_id,
+        speakers=[
+            SpeakerStats(
+                speaker_id=s["speaker_id"],
+                speaker_label=s.get("speaker_label"),
+                segment_count=s["segment_count"],
+                speaking_time=float(s["speaking_time"] or 0),
+                character_count=s["character_count"] or 0
+            )
+            for s in speakers
+        ]
+    )
+
+
+@router.get("/{meeting_id}/transcript/text")
+async def get_transcript_text(request: Request, meeting_id: str):
+    """Get transcript as plain text."""
+    db = request.app.state.db
+
+    # Verify meeting exists
+    meeting = await db.get_meeting(meeting_id)
+    if not meeting:
+        raise HTTPException(status_code=404, detail="Meeting not found")
+
+    segments = await db.get_transcript(meeting_id)
+
+    if not segments:
+        raise HTTPException(
+            status_code=404,
+            detail="No transcript available for this meeting"
+        )
+
+    # Format as plain text
+    lines = []
+    current_speaker = None
+
+    for s in segments:
+        speaker = s.get("speaker_label") or "Unknown"
+
+        if speaker != current_speaker:
+            lines.append(f"\n{speaker}:")
+            current_speaker = speaker
+
+        lines.append(f"  {s['text']}")
+
+    text = "\n".join(lines)
+
+    return {
+        "meeting_id": meeting_id,
+        "text": text,
+        "format": "plain"
+    }
--- a/deploy/meeting-intelligence/api/app/routes/webhooks.py
+++ b/deploy/meeting-intelligence/api/app/routes/webhooks.py
@ -0,0 +1,139 @@
+"""
+Webhook routes for Jibri recording callbacks.
+"""
+
+from datetime import datetime
+from typing import Optional
+
+import httpx
+from fastapi import APIRouter, HTTPException, Request, BackgroundTasks
+from pydantic import BaseModel
+
+from ..config import settings
+
+import structlog
+
+log = structlog.get_logger()
+
+router = APIRouter()
+
+
+class RecordingCompletePayload(BaseModel):
+    event_type: str
+    conference_id: str
+    recording_path: str
+    recording_dir: Optional[str] = None
+    file_size_bytes: Optional[int] = None
+    completed_at: Optional[str] = None
+    metadata: Optional[dict] = None
+
+
+class WebhookResponse(BaseModel):
+    status: str
+    meeting_id: str
+    message: str
+
+
+@router.post("/recording-complete", response_model=WebhookResponse)
+async def recording_complete(
+    request: Request,
+    payload: RecordingCompletePayload,
+    background_tasks: BackgroundTasks
+):
+    """
+    Webhook called by Jibri when a recording completes.
+
+    This triggers the processing pipeline:
+    1. Create meeting record
+    2. Queue transcription job
+    3. (Later) Generate summary
+    """
+    db = request.app.state.db
+
+    log.info(
+        "Recording complete webhook received",
+        conference_id=payload.conference_id,
+        recording_path=payload.recording_path
+    )
+
+    # Save webhook event for audit
+    await db.save_webhook_event(
+        event_type=payload.event_type,
+        payload=payload.model_dump()
+    )
+
+    # Create meeting record
+    meeting_id = await db.create_meeting(
+        conference_id=payload.conference_id,
+        conference_name=payload.conference_id,  # Use conference_id as name for now
+        title=f"Meeting - {payload.conference_id}",
+        recording_path=payload.recording_path,
+        started_at=datetime.utcnow(),  # Will be updated from recording metadata
+        metadata=payload.metadata or {}
+    )
+
+    log.info("Meeting record created", meeting_id=meeting_id)
+
+    # Update meeting status
+    await db.update_meeting(meeting_id, status="extracting_audio")
+
+    # Queue transcription job
+    job_id = await db.create_job(
+        meeting_id=meeting_id,
+        job_type="transcribe",
+        priority=5,
+        result={
+            "video_path": payload.recording_path,
+            "enable_diarization": True
+        }
+    )
+
+    log.info("Transcription job queued", job_id=job_id, meeting_id=meeting_id)
+
+    # Trigger transcription service asynchronously
+    background_tasks.add_task(
+        _notify_transcriber,
+        meeting_id,
+        payload.recording_path
+    )
+
+    return WebhookResponse(
+        status="accepted",
+        meeting_id=meeting_id,
+        message="Recording queued for processing"
+    )
+
+
+async def _notify_transcriber(meeting_id: str, recording_path: str):
+    """Notify the transcription service to start processing."""
+    try:
+        async with httpx.AsyncClient(timeout=30.0) as client:
+            response = await client.post(
+                f"{settings.transcriber_url}/transcribe",
+                json={
+                    "meeting_id": meeting_id,
+                    "video_path": recording_path,
+                    "enable_diarization": True
+                }
+            )
+            response.raise_for_status()
+            log.info(
+                "Transcriber notified",
+                meeting_id=meeting_id,
+                response=response.json()
+            )
+    except Exception as e:
+        log.error(
+            "Failed to notify transcriber",
+            meeting_id=meeting_id,
+            error=str(e)
+        )
+        # Job is in database, transcriber will pick it up on next poll
+
+
+@router.post("/test")
+async def test_webhook(request: Request):
+    """Test endpoint for webhook connectivity."""
+    body = await request.json()
+    log.info("Test webhook received", body=body)
+    return {"status": "ok", "received": body}
--- a/deploy/meeting-intelligence/api/requirements.txt
+++ b/deploy/meeting-intelligence/api/requirements.txt
@ -0,0 +1,37 @@
+# Meeting Intelligence API Dependencies
+
+# Web framework
+fastapi==0.109.2
+uvicorn[standard]==0.27.1
+python-multipart==0.0.9
+
+# Database
+asyncpg==0.29.0
+sqlalchemy[asyncio]==2.0.25
+psycopg2-binary==2.9.9
+
+# Redis
+redis==5.0.1
+
+# HTTP client (for Ollama)
+httpx==0.26.0
+aiohttp==3.9.3
+
+# Validation
+pydantic==2.6.1
+pydantic-settings==2.1.0
+
+# Sentence embeddings (for semantic search)
+sentence-transformers==2.3.1
+numpy==1.26.4
+
+# PDF export
+reportlab==4.0.8
+markdown2==2.4.12
+
+# Utilities
+python-dotenv==1.0.1
+tenacity==8.2.3
+
+# Logging
+structlog==24.1.0
--- a/deploy/meeting-intelligence/docker-compose.yml
+++ b/deploy/meeting-intelligence/docker-compose.yml
@ -0,0 +1,186 @@
+# Meeting Intelligence System - Full Docker Compose
+# Deploy on Netcup RS 8000 at /opt/meeting-intelligence/
+#
+# Components:
+# - Jibri (recording)
+# - Transcriber (whisper.cpp + diarization)
+# - Meeting Intelligence API
+# - PostgreSQL (storage)
+# - Redis (job queue)
+
+services:
+  # ============================================================
+  # PostgreSQL Database
+  # ============================================================
+  postgres:
+    image: pgvector/pgvector:pg16
+    container_name: meeting-intelligence-db
+    restart: unless-stopped
+    environment:
+      POSTGRES_USER: meeting_intelligence
+      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
+      POSTGRES_DB: meeting_intelligence
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+      - ./postgres/init.sql:/docker-entrypoint-initdb.d/init.sql:ro
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U meeting_intelligence"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    networks:
+      - meeting-intelligence
+
+  # ============================================================
+  # Redis Job Queue
+  # ============================================================
+  redis:
+    image: redis:7-alpine
+    container_name: meeting-intelligence-redis
+    restart: unless-stopped
+    command: redis-server --appendonly yes
+    volumes:
+      - redis_data:/data
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    networks:
+      - meeting-intelligence
+
+  # ============================================================
+  # Transcription Service (whisper.cpp + diarization)
+  # ============================================================
+  transcriber:
+    build:
+      context: ./transcriber
+      dockerfile: Dockerfile
+    container_name: meeting-intelligence-transcriber
+    restart: unless-stopped
+    environment:
+      REDIS_URL: redis://redis:6379
+      POSTGRES_URL: postgresql://meeting_intelligence:${POSTGRES_PASSWORD:-changeme}@postgres:5432/meeting_intelligence
+      WHISPER_MODEL: small
+      WHISPER_THREADS: 8
+      NUM_WORKERS: 4
+    volumes:
+      - recordings:/recordings:ro
+      - audio_processed:/audio
+      - whisper_models:/models
+    depends_on:
+      postgres:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+    deploy:
+      resources:
+        limits:
+          cpus: '12'
+          memory: 16G
+    networks:
+      - meeting-intelligence
+
+  # ============================================================
+  # Meeting Intelligence API
+  # ============================================================
+  api:
+    build:
+      context: ./api
+      dockerfile: Dockerfile
+    container_name: meeting-intelligence-api
+    restart: unless-stopped
+    environment:
+      REDIS_URL: redis://redis:6379
+      POSTGRES_URL: postgresql://meeting_intelligence:${POSTGRES_PASSWORD:-changeme}@postgres:5432/meeting_intelligence
+      OLLAMA_URL: http://host.docker.internal:11434
+      RECORDINGS_PATH: /recordings
+      SECRET_KEY: ${API_SECRET_KEY:-changeme}
+    volumes:
+      - recordings:/recordings
+    depends_on:
+      postgres:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+    labels:
+      - "traefik.enable=true"
+      - "traefik.http.routers.meeting-intelligence.rule=Host(`meet.jeffemmett.com`) && PathPrefix(`/api/intelligence`)"
+      - "traefik.http.services.meeting-intelligence.loadbalancer.server.port=8000"
+      - "traefik.http.routers.meeting-intelligence.middlewares=strip-intelligence-prefix"
+      - "traefik.http.middlewares.strip-intelligence-prefix.stripprefix.prefixes=/api/intelligence"
+    networks:
+      - meeting-intelligence
+      - traefik-public
+
+  # ============================================================
+  # Jibri Recording Service
+  # ============================================================
+  jibri:
+    image: jitsi/jibri:stable-9584
+    container_name: meeting-intelligence-jibri
+    restart: unless-stopped
+    privileged: true
+    environment:
+      # XMPP Connection
+      XMPP_SERVER: ${XMPP_SERVER:-meet.jeffemmett.com}
+      XMPP_DOMAIN: ${XMPP_DOMAIN:-meet.jeffemmett.com}
+      XMPP_AUTH_DOMAIN: auth.${XMPP_DOMAIN:-meet.jeffemmett.com}
+      XMPP_INTERNAL_MUC_DOMAIN: internal.auth.${XMPP_DOMAIN:-meet.jeffemmett.com}
+      XMPP_RECORDER_DOMAIN: recorder.${XMPP_DOMAIN:-meet.jeffemmett.com}
+      XMPP_MUC_DOMAIN: muc.${XMPP_DOMAIN:-meet.jeffemmett.com}
+
+      # Jibri Settings
+      JIBRI_BREWERY_MUC: JibriBrewery
+      JIBRI_PENDING_TIMEOUT: 90
+      JIBRI_RECORDING_DIR: /recordings
+      JIBRI_FINALIZE_RECORDING_SCRIPT_PATH: /config/finalize.sh
+      JIBRI_XMPP_USER: jibri
+      JIBRI_XMPP_PASSWORD: ${JIBRI_XMPP_PASSWORD:-changeme}
+      JIBRI_RECORDER_USER: recorder
+      JIBRI_RECORDER_PASSWORD: ${JIBRI_RECORDER_PASSWORD:-changeme}
+
+      # Display Settings
+      DISPLAY: ":0"
+      CHROMIUM_FLAGS: --use-fake-ui-for-media-stream,--start-maximized,--kiosk,--enabled,--disable-infobars,--autoplay-policy=no-user-gesture-required
+
+      # Public URL
+      PUBLIC_URL: https://${XMPP_DOMAIN:-meet.jeffemmett.com}
+
+      # Timezone
+      TZ: UTC
+    volumes:
+      - recordings:/recordings
+      - ./jibri/config:/config
+      - /dev/shm:/dev/shm
+    cap_add:
+      - SYS_ADMIN
+      - NET_BIND_SERVICE
+    security_opt:
+      - seccomp:unconfined
+    shm_size: 2gb
+    networks:
+      - meeting-intelligence
+
+volumes:
+  postgres_data:
+  redis_data:
+  recordings:
+    driver: local
+    driver_opts:
+      type: none
+      o: bind
+      device: /opt/meetings/recordings
+  audio_processed:
+    driver: local
+    driver_opts:
+      type: none
+      o: bind
+      device: /opt/meetings/audio
+  whisper_models:
+
+networks:
+  meeting-intelligence:
+    driver: bridge
+  traefik-public:
+    external: true
--- a/deploy/meeting-intelligence/jibri/config/finalize.sh
+++ b/deploy/meeting-intelligence/jibri/config/finalize.sh
@ -0,0 +1,104 @@
+#!/bin/bash
+# Jibri Recording Finalize Script
+# Called when Jibri finishes recording a meeting
+#
+# Arguments:
+# $1 - Recording directory path (e.g., /recordings/<conference_id>/<timestamp>)
+#
+# This script:
+# 1. Finds the recording file
+# 2. Notifies the Meeting Intelligence API to start processing
+
+set -e
+
+RECORDING_DIR="$1"
+API_URL="${MEETING_INTELLIGENCE_API:-http://api:8000}"
+LOG_FILE="/var/log/jibri/finalize.log"
+
+log() {
+    echo "[$(date -Iseconds)] $1" >> "$LOG_FILE"
+    echo "[$(date -Iseconds)] $1"
+}
+
+log "=== Finalize script started ==="
+log "Recording directory: $RECORDING_DIR"
+
+# Validate recording directory
+if [ -z "$RECORDING_DIR" ] || [ ! -d "$RECORDING_DIR" ]; then
+    log "ERROR: Invalid recording directory: $RECORDING_DIR"
+    exit 1
+fi
+
+# Find the recording file (MP4 or WebM)
+RECORDING_FILE=$(find "$RECORDING_DIR" -type f \( -name "*.mp4" -o -name "*.webm" \) | head -1)
+
+if [ -z "$RECORDING_FILE" ]; then
+    log "ERROR: No recording file found in $RECORDING_DIR"
+    exit 1
+fi
+
+log "Found recording file: $RECORDING_FILE"
+
+# Get file info
+FILE_SIZE=$(stat -c%s "$RECORDING_FILE" 2>/dev/null || echo "0")
+log "Recording file size: $FILE_SIZE bytes"
+
+# Extract conference info from path
+# Expected format: /recordings/<conference_id>/<timestamp>/recording.mp4
+CONFERENCE_ID=$(echo "$RECORDING_DIR" | awk -F'/' '{print $(NF-1)}')
+if [ -z "$CONFERENCE_ID" ]; then
+    CONFERENCE_ID=$(basename "$(dirname "$RECORDING_DIR")")
+fi
+
+# Look for metadata file (Jibri sometimes creates this)
+METADATA_FILE="$RECORDING_DIR/metadata.json"
+if [ -f "$METADATA_FILE" ]; then
+    log "Found metadata file: $METADATA_FILE"
+    METADATA=$(cat "$METADATA_FILE")
+else
+    METADATA="{}"
+fi
+
+# Prepare webhook payload
+PAYLOAD=$(cat <<EOF
+{
+    "event_type": "recording_completed",
+    "conference_id": "$CONFERENCE_ID",
+    "recording_path": "$RECORDING_FILE",
+    "recording_dir": "$RECORDING_DIR",
+    "file_size_bytes": $FILE_SIZE,
+    "completed_at": "$(date -Iseconds)",
+    "metadata": $METADATA
+}
+EOF
+)
+
+log "Sending webhook to $API_URL/webhooks/recording-complete"
+log "Payload: $PAYLOAD"
+
+# Send webhook to Meeting Intelligence API
+RESPONSE=$(curl -s -w "\n%{http_code}" \
+    -X POST \
+    -H "Content-Type: application/json" \
+    -d "$PAYLOAD" \
+    "$API_URL/webhooks/recording-complete" 2>&1)
+
+HTTP_CODE=$(echo "$RESPONSE" | tail -1)
+BODY=$(echo "$RESPONSE" | head -n -1)
+
+if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "201" ] || [ "$HTTP_CODE" = "202" ]; then
+    log "SUCCESS: Webhook accepted (HTTP $HTTP_CODE)"
+    log "Response: $BODY"
+else
+    log "WARNING: Webhook returned HTTP $HTTP_CODE"
+    log "Response: $BODY"
+
+    # Don't fail the script - the recording is still saved
+    # The API can be retried later
+fi
+
+# Optional: Clean up old recordings (keep last 30 days)
+# find /recordings -type f -mtime +30 -delete
+
+log "=== Finalize script completed ==="
+exit 0
--- a/deploy/meeting-intelligence/postgres/init.sql
+++ b/deploy/meeting-intelligence/postgres/init.sql
@ -0,0 +1,310 @@
+-- Meeting Intelligence System - PostgreSQL Schema
+-- Uses pgvector extension for semantic search
+
+-- Enable required extensions
+CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
+CREATE EXTENSION IF NOT EXISTS "vector";
+
+-- ============================================================
+-- Meetings Table
+-- ============================================================
+CREATE TABLE meetings (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    conference_id VARCHAR(255) NOT NULL,
+    conference_name VARCHAR(255),
+    title VARCHAR(500),
+    started_at TIMESTAMP WITH TIME ZONE,
+    ended_at TIMESTAMP WITH TIME ZONE,
+    duration_seconds INTEGER,
+    recording_path VARCHAR(1000),
+    audio_path VARCHAR(1000),
+    status VARCHAR(50) DEFAULT 'recording',
+    -- Status: 'recording', 'extracting_audio', 'transcribing', 'diarizing', 'summarizing', 'ready', 'failed'
+    error_message TEXT,
+    metadata JSONB DEFAULT '{}',
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX idx_meetings_conference_id ON meetings(conference_id);
+CREATE INDEX idx_meetings_status ON meetings(status);
+CREATE INDEX idx_meetings_started_at ON meetings(started_at DESC);
+CREATE INDEX idx_meetings_created_at ON meetings(created_at DESC);
+
+-- ============================================================
+-- Meeting Participants
+-- ============================================================
+CREATE TABLE meeting_participants (
+    id SERIAL PRIMARY KEY,
+    meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
+    participant_id VARCHAR(255) NOT NULL,
+    display_name VARCHAR(255),
+    email VARCHAR(255),
+    joined_at TIMESTAMP WITH TIME ZONE,
+    left_at TIMESTAMP WITH TIME ZONE,
+    duration_seconds INTEGER,
+    is_moderator BOOLEAN DEFAULT FALSE,
+    metadata JSONB DEFAULT '{}',
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX idx_participants_meeting_id ON meeting_participants(meeting_id);
+CREATE INDEX idx_participants_participant_id ON meeting_participants(participant_id);
+
+-- ============================================================
+-- Transcripts
+-- ============================================================
+CREATE TABLE transcripts (
+    id SERIAL PRIMARY KEY,
+    meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
+    segment_index INTEGER NOT NULL,
+    start_time FLOAT NOT NULL,
+    end_time FLOAT NOT NULL,
+    speaker_id VARCHAR(255),
+    speaker_name VARCHAR(255),
+    speaker_label VARCHAR(50), -- e.g., "Speaker 1", "Speaker 2"
+    text TEXT NOT NULL,
+    confidence FLOAT,
+    language VARCHAR(10) DEFAULT 'en',
+    word_timestamps JSONB, -- Array of {word, start, end, confidence}
+    metadata JSONB DEFAULT '{}',
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX idx_transcripts_meeting_id ON transcripts(meeting_id);
+CREATE INDEX idx_transcripts_speaker_id ON transcripts(speaker_id);
+CREATE INDEX idx_transcripts_start_time ON transcripts(meeting_id, start_time);
+CREATE INDEX idx_transcripts_text_search ON transcripts USING gin(to_tsvector('english', text));
+
+-- ============================================================
+-- Transcript Embeddings (for semantic search)
+-- ============================================================
+CREATE TABLE transcript_embeddings (
+    id SERIAL PRIMARY KEY,
+    transcript_id INTEGER NOT NULL REFERENCES transcripts(id) ON DELETE CASCADE,
+    meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
+    embedding vector(384), -- all-MiniLM-L6-v2 dimensions
+    chunk_text TEXT, -- The text chunk this embedding represents
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX idx_embeddings_transcript_id ON transcript_embeddings(transcript_id);
+CREATE INDEX idx_embeddings_meeting_id ON transcript_embeddings(meeting_id);
+CREATE INDEX idx_embeddings_vector ON transcript_embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
+
+-- ============================================================
+-- AI Summaries
+-- ============================================================
+CREATE TABLE summaries (
+    id SERIAL PRIMARY KEY,
+    meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
+    summary_text TEXT,
+    key_points JSONB, -- Array of key point strings
+    action_items JSONB, -- Array of {task, assignee, due_date, completed}
+    decisions JSONB, -- Array of decision strings
+    topics JSONB, -- Array of {topic, duration_seconds, relevance_score}
+    sentiment VARCHAR(50), -- 'positive', 'neutral', 'negative', 'mixed'
+    sentiment_scores JSONB, -- {positive: 0.7, neutral: 0.2, negative: 0.1}
+    participants_summary JSONB, -- {participant_id: {speaking_time, word_count, topics}}
+    model_used VARCHAR(100),
+    prompt_tokens INTEGER,
+    completion_tokens INTEGER,
+    generated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    metadata JSONB DEFAULT '{}'
+);
+
+CREATE INDEX idx_summaries_meeting_id ON summaries(meeting_id);
+CREATE INDEX idx_summaries_generated_at ON summaries(generated_at DESC);
+
+-- ============================================================
+-- Processing Jobs Queue
+-- ============================================================
+CREATE TABLE processing_jobs (
+    id SERIAL PRIMARY KEY,
+    meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
+    job_type VARCHAR(50) NOT NULL, -- 'extract_audio', 'transcribe', 'diarize', 'summarize', 'embed'
+    status VARCHAR(50) DEFAULT 'pending', -- 'pending', 'processing', 'completed', 'failed', 'cancelled'
+    priority INTEGER DEFAULT 5, -- 1 = highest, 10 = lowest
+    attempts INTEGER DEFAULT 0,
+    max_attempts INTEGER DEFAULT 3,
+    started_at TIMESTAMP WITH TIME ZONE,
+    completed_at TIMESTAMP WITH TIME ZONE,
+    error_message TEXT,
+    result JSONB,
+    worker_id VARCHAR(100),
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX idx_jobs_meeting_id ON processing_jobs(meeting_id);
+CREATE INDEX idx_jobs_status ON processing_jobs(status, priority, created_at);
+CREATE INDEX idx_jobs_type_status ON processing_jobs(job_type, status);
+
+-- ============================================================
+-- Search History (for analytics)
+-- ============================================================
+CREATE TABLE search_history (
+    id SERIAL PRIMARY KEY,
+    user_id VARCHAR(255),
+    query TEXT NOT NULL,
+    search_type VARCHAR(50), -- 'text', 'semantic', 'combined'
+    results_count INTEGER,
+    meeting_ids UUID[],
+    filters JSONB,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX idx_search_history_created_at ON search_history(created_at DESC);
+
+-- ============================================================
+-- Webhook Events (for Jibri callbacks)
+-- ============================================================
+CREATE TABLE webhook_events (
+    id SERIAL PRIMARY KEY,
+    event_type VARCHAR(100) NOT NULL,
+    payload JSONB NOT NULL,
+    processed BOOLEAN DEFAULT FALSE,
+    processed_at TIMESTAMP WITH TIME ZONE,
+    error_message TEXT,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX idx_webhooks_processed ON webhook_events(processed, created_at);
+
+-- ============================================================
+-- Functions
+-- ============================================================
+
+-- Update timestamp trigger
+CREATE OR REPLACE FUNCTION update_updated_at()
+RETURNS TRIGGER AS $$
+BEGIN
+    NEW.updated_at = NOW();
+    RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE TRIGGER meetings_updated_at
+    BEFORE UPDATE ON meetings
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at();
+
+CREATE TRIGGER jobs_updated_at
+    BEFORE UPDATE ON processing_jobs
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at();
+
+-- Semantic search function
+CREATE OR REPLACE FUNCTION semantic_search(
+    query_embedding vector(384),
+    match_threshold FLOAT DEFAULT 0.7,
+    match_count INT DEFAULT 10,
+    meeting_filter UUID DEFAULT NULL
+)
+RETURNS TABLE (
+    transcript_id INT,
+    meeting_id UUID,
+    chunk_text TEXT,
+    similarity FLOAT
+) AS $$
+BEGIN
+    RETURN QUERY
+    SELECT
+        te.transcript_id,
+        te.meeting_id,
+        te.chunk_text,
+        1 - (te.embedding <=> query_embedding) AS similarity
+    FROM transcript_embeddings te
+    WHERE
+        (meeting_filter IS NULL OR te.meeting_id = meeting_filter)
+        AND 1 - (te.embedding <=> query_embedding) > match_threshold
+    ORDER BY te.embedding <=> query_embedding
+    LIMIT match_count;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Full-text search function
+CREATE OR REPLACE FUNCTION fulltext_search(
+    search_query TEXT,
+    meeting_filter UUID DEFAULT NULL,
+    match_count INT DEFAULT 50
+)
+RETURNS TABLE (
+    transcript_id INT,
+    meeting_id UUID,
+    text TEXT,
+    speaker_name VARCHAR,
+    start_time FLOAT,
+    rank FLOAT
+) AS $$
+BEGIN
+    RETURN QUERY
+    SELECT
+        t.id,
+        t.meeting_id,
+        t.text,
+        t.speaker_name,
+        t.start_time,
+        ts_rank(to_tsvector('english', t.text), plainto_tsquery('english', search_query)) AS rank
+    FROM transcripts t
+    WHERE
+        (meeting_filter IS NULL OR t.meeting_id = meeting_filter)
+        AND to_tsvector('english', t.text) @@ plainto_tsquery('english', search_query)
+    ORDER BY rank DESC
+    LIMIT match_count;
+END;
+$$ LANGUAGE plpgsql;
+
+-- ============================================================
+-- Views
+-- ============================================================
+
+-- Meeting overview with stats
+CREATE VIEW meeting_overview AS
+SELECT
+    m.id,
+    m.conference_id,
+    m.conference_name,
+    m.title,
+    m.started_at,
+    m.ended_at,
+    m.duration_seconds,
+    m.status,
+    m.recording_path,
+    COUNT(DISTINCT mp.id) AS participant_count,
+    COUNT(DISTINCT t.id) AS transcript_segment_count,
+    COALESCE(SUM(LENGTH(t.text)), 0) AS total_characters,
+    s.id IS NOT NULL AS has_summary,
+    m.created_at
+FROM meetings m
+LEFT JOIN meeting_participants mp ON m.id = mp.meeting_id
+LEFT JOIN transcripts t ON m.id = t.meeting_id
+LEFT JOIN summaries s ON m.id = s.meeting_id
+GROUP BY m.id, s.id;
+
+-- Speaker stats per meeting
+CREATE VIEW speaker_stats AS
+SELECT
+    t.meeting_id,
+    t.speaker_id,
+    t.speaker_name,
+    t.speaker_label,
+    COUNT(*) AS segment_count,
+    SUM(t.end_time - t.start_time) AS speaking_time_seconds,
+    SUM(LENGTH(t.text)) AS character_count,
+    SUM(array_length(regexp_split_to_array(t.text, '\s+'), 1)) AS word_count
+FROM transcripts t
+GROUP BY t.meeting_id, t.speaker_id, t.speaker_name, t.speaker_label;
+
+-- ============================================================
+-- Sample Data (for testing - remove in production)
+-- ============================================================
+
+-- INSERT INTO meetings (conference_id, conference_name, title, started_at, status)
+-- VALUES ('test-room-123', 'Test Room', 'Test Meeting', NOW() - INTERVAL '1 hour', 'ready');
+
+COMMENT ON TABLE meetings IS 'Stores meeting metadata and processing status';
+COMMENT ON TABLE transcripts IS 'Stores time-stamped transcript segments with speaker attribution';
+COMMENT ON TABLE summaries IS 'Stores AI-generated meeting summaries and extracted information';
+COMMENT ON TABLE transcript_embeddings IS 'Stores vector embeddings for semantic search';
+COMMENT ON TABLE processing_jobs IS 'Job queue for async processing tasks';
--- a/deploy/meeting-intelligence/transcriber/Dockerfile
+++ b/deploy/meeting-intelligence/transcriber/Dockerfile
@ -0,0 +1,67 @@
+# Meeting Intelligence Transcription Service
+# Uses whisper.cpp for fast CPU-based transcription
+# Uses resemblyzer for speaker diarization
+
+FROM python:3.11-slim AS builder
+
+# Install build dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    cmake \
+    git \
+    ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+
+# Build whisper.cpp
+WORKDIR /build
+RUN git clone https://github.com/ggerganov/whisper.cpp.git && \
+    cd whisper.cpp && \
+    cmake -B build -DWHISPER_BUILD_EXAMPLES=ON && \
+    cmake --build build --config Release -j$(nproc) && \
+    cp build/bin/whisper-cli /usr/local/bin/whisper && \
+    cp build/bin/whisper-server /usr/local/bin/whisper-server 2>/dev/null || true
+
+# Download whisper models
+WORKDIR /models
+RUN cd /build/whisper.cpp && \
+    bash models/download-ggml-model.sh small && \
+    mv models/ggml-small.bin /models/
+
+# Production image
+FROM python:3.11-slim
+
+# Install runtime dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    libsndfile1 \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy whisper binary and models
+COPY --from=builder /usr/local/bin/whisper /usr/local/bin/whisper
+COPY --from=builder /models /models
+
+# Set up Python environment
+WORKDIR /app
+
+# Install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application code
+COPY app/ ./app/
+
+# Create directories
+RUN mkdir -p /recordings /audio /logs
+
+# Environment variables
+ENV PYTHONUNBUFFERED=1
+ENV WHISPER_MODEL=/models/ggml-small.bin
+ENV WHISPER_THREADS=8
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:8001/health || exit 1
+
+# Run the service
+EXPOSE 8001
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8001", "--workers", "1"]
--- a/deploy/meeting-intelligence/transcriber/app/init.py
+++ b/deploy/meeting-intelligence/transcriber/app/init.py
@ -0,0 +1 @@
+# Meeting Intelligence Transcription Service
--- a/deploy/meeting-intelligence/transcriber/app/config.py
+++ b/deploy/meeting-intelligence/transcriber/app/config.py
@ -0,0 +1,45 @@
+"""
+Configuration settings for the Transcription Service.
+"""
+
+import os
+from pydantic_settings import BaseSettings
+
+
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables."""
+
+    # Redis configuration
+    redis_url: str = "redis://localhost:6379"
+
+    # PostgreSQL configuration
+    postgres_url: str = "postgresql://meeting_intelligence:changeme@localhost:5432/meeting_intelligence"
+
+    # Whisper configuration
+    whisper_model: str = "/models/ggml-small.bin"
+    whisper_threads: int = 8
+    whisper_language: str = "en"
+
+    # Worker configuration
+    num_workers: int = 4
+    job_timeout: int = 7200  # 2 hours in seconds
+
+    # Audio processing
+    audio_sample_rate: int = 16000
+    audio_channels: int = 1
+
+    # Diarization settings
+    min_speaker_duration: float = 0.5  # Minimum speaker segment in seconds
+    max_speakers: int = 10
+
+    # Paths
+    recordings_path: str = "/recordings"
+    audio_output_path: str = "/audio"
+    temp_path: str = "/tmp/transcriber"
+
+    class Config:
+        env_file = ".env"
+        env_file_encoding = "utf-8"
+
+
+settings = Settings()
--- a/deploy/meeting-intelligence/transcriber/app/database.py
+++ b/deploy/meeting-intelligence/transcriber/app/database.py
@ -0,0 +1,245 @@
+"""
+Database operations for the Transcription Service.
+"""
+
+import uuid
+from typing import Optional, List, Dict, Any
+
+import asyncpg
+import structlog
+
+log = structlog.get_logger()
+
+
+class Database:
+    """Database operations for transcription service."""
+
+    def __init__(self, connection_string: str):
+        self.connection_string = connection_string
+        self.pool: Optional[asyncpg.Pool] = None
+
+    async def connect(self):
+        """Establish database connection pool."""
+        log.info("Connecting to database...")
+        self.pool = await asyncpg.create_pool(
+            self.connection_string,
+            min_size=2,
+            max_size=10
+        )
+        log.info("Database connected")
+
+    async def disconnect(self):
+        """Close database connection pool."""
+        if self.pool:
+            await self.pool.close()
+            log.info("Database disconnected")
+
+    async def health_check(self):
+        """Check database connectivity."""
+        async with self.pool.acquire() as conn:
+            await conn.fetchval("SELECT 1")
+
+    async def create_transcription_job(
+        self,
+        meeting_id: str,
+        audio_path: Optional[str] = None,
+        video_path: Optional[str] = None,
+        enable_diarization: bool = True,
+        language: Optional[str] = None,
+        priority: int = 5
+    ) -> str:
+        """Create a new transcription job."""
+        job_id = str(uuid.uuid4())
+
+        async with self.pool.acquire() as conn:
+            await conn.execute("""
+                INSERT INTO processing_jobs (
+                    id, meeting_id, job_type, status, priority,
+                    result
+                )
+                VALUES ($1, $2::uuid, 'transcribe', 'pending', $3, $4)
+            """, job_id, meeting_id, priority, {
+                "audio_path": audio_path,
+                "video_path": video_path,
+                "enable_diarization": enable_diarization,
+                "language": language
+            })
+
+        log.info("Created transcription job", job_id=job_id, meeting_id=meeting_id)
+        return job_id
+
+    async def get_job(self, job_id: str) -> Optional[Dict[str, Any]]:
+        """Get a job by ID."""
+        async with self.pool.acquire() as conn:
+            row = await conn.fetchrow("""
+                SELECT id, meeting_id, job_type, status, priority,
+                       attempts, started_at, completed_at,
+                       error_message, result, created_at
+                FROM processing_jobs
+                WHERE id = $1
+            """, job_id)
+
+            if row:
+                return dict(row)
+            return None
+
+    async def get_next_pending_job(self) -> Optional[Dict[str, Any]]:
+        """Get the next pending job and mark it as processing."""
+        async with self.pool.acquire() as conn:
+            # Use FOR UPDATE SKIP LOCKED to prevent race conditions
+            row = await conn.fetchrow("""
+                UPDATE processing_jobs
+                SET status = 'processing',
+                    started_at = NOW(),
+                    attempts = attempts + 1
+                WHERE id = (
+                    SELECT id FROM processing_jobs
+                    WHERE status = 'pending'
+                      AND job_type = 'transcribe'
+                    ORDER BY priority ASC, created_at ASC
+                    FOR UPDATE SKIP LOCKED
+                    LIMIT 1
+                )
+                RETURNING id, meeting_id, job_type, result
+            """)
+
+            if row:
+                result = dict(row)
+                # Merge result JSON into the dict
+                if result.get("result"):
+                    result.update(result["result"])
+                return result
+            return None
+
+    async def update_job_status(
+        self,
+        job_id: str,
+        status: str,
+        error_message: Optional[str] = None,
+        result: Optional[dict] = None,
+        progress: Optional[float] = None
+    ):
+        """Update job status."""
+        async with self.pool.acquire() as conn:
+            if status == "completed":
+                await conn.execute("""
+                    UPDATE processing_jobs
+                    SET status = $1,
+                        completed_at = NOW(),
+                        error_message = $2,
+                        result = COALESCE($3::jsonb, result)
+                    WHERE id = $4
+                """, status, error_message, result, job_id)
+            else:
+                update_result = result
+                if progress is not None:
+                    update_result = result or {}
+                    update_result["progress"] = progress
+
+                await conn.execute("""
+                    UPDATE processing_jobs
+                    SET status = $1,
+                        error_message = $2,
+                        result = COALESCE($3::jsonb, result)
+                    WHERE id = $4
+                """, status, error_message, update_result, job_id)
+
+    async def update_job_audio_path(self, job_id: str, audio_path: str):
+        """Update the audio path for a job."""
+        async with self.pool.acquire() as conn:
+            await conn.execute("""
+                UPDATE processing_jobs
+                SET result = result || $1::jsonb
+                WHERE id = $2
+            """, {"audio_path": audio_path}, job_id)
+
+    async def update_meeting_status(self, meeting_id: str, status: str):
+        """Update meeting processing status."""
+        async with self.pool.acquire() as conn:
+            await conn.execute("""
+                UPDATE meetings
+                SET status = $1,
+                    updated_at = NOW()
+                WHERE id = $2::uuid
+            """, status, meeting_id)
+
+    async def insert_transcript_segment(
+        self,
+        meeting_id: str,
+        segment_index: int,
+        start_time: float,
+        end_time: float,
+        text: str,
+        speaker_id: Optional[str] = None,
+        speaker_label: Optional[str] = None,
+        confidence: Optional[float] = None,
+        language: str = "en"
+    ):
+        """Insert a transcript segment."""
+        async with self.pool.acquire() as conn:
+            await conn.execute("""
+                INSERT INTO transcripts (
+                    meeting_id, segment_index, start_time, end_time,
+                    text, speaker_id, speaker_label, confidence, language
+                )
+                VALUES ($1::uuid, $2, $3, $4, $5, $6, $7, $8, $9)
+            """, meeting_id, segment_index, start_time, end_time,
+               text, speaker_id, speaker_label, confidence, language)
+
+    async def get_transcript(self, meeting_id: str) -> List[Dict[str, Any]]:
+        """Get all transcript segments for a meeting."""
+        async with self.pool.acquire() as conn:
+            rows = await conn.fetch("""
+                SELECT id, segment_index, start_time, end_time,
+                       speaker_id, speaker_label, text, confidence, language
+                FROM transcripts
+                WHERE meeting_id = $1::uuid
+                ORDER BY segment_index ASC
+            """, meeting_id)
+
+            return [dict(row) for row in rows]
+
+    async def get_meeting(self, meeting_id: str) -> Optional[Dict[str, Any]]:
+        """Get meeting details."""
+        async with self.pool.acquire() as conn:
+            row = await conn.fetchrow("""
+                SELECT id, conference_id, conference_name, title,
+                       started_at, ended_at, duration_seconds,
+                       recording_path, audio_path, status,
+                       metadata, created_at
+                FROM meetings
+                WHERE id = $1::uuid
+            """, meeting_id)
+
+            if row:
+                return dict(row)
+            return None
+
+    async def create_meeting(
+        self,
+        conference_id: str,
+        conference_name: Optional[str] = None,
+        title: Optional[str] = None,
+        recording_path: Optional[str] = None,
+        metadata: Optional[dict] = None
+    ) -> str:
+        """Create a new meeting record."""
+        meeting_id = str(uuid.uuid4())
+
+        async with self.pool.acquire() as conn:
+            await conn.execute("""
+                INSERT INTO meetings (
+                    id, conference_id, conference_name, title,
+                    recording_path, status, metadata
+                )
+                VALUES ($1, $2, $3, $4, $5, 'recording', $6)
+            """, meeting_id, conference_id, conference_name, title,
+               recording_path, metadata or {})
+
+        log.info("Created meeting", meeting_id=meeting_id, conference_id=conference_id)
+        return meeting_id
+
+
+class DatabaseError(Exception):
+    """Database operation error."""
+    pass
--- a/deploy/meeting-intelligence/transcriber/app/diarizer.py
+++ b/deploy/meeting-intelligence/transcriber/app/diarizer.py
@ -0,0 +1,338 @@
+"""
+Speaker Diarization using resemblyzer.
+
+Identifies who spoke when in the audio.
+"""
+
+import os
+from dataclasses import dataclass
+from typing import List, Optional, Tuple
+
+import numpy as np
+import soundfile as sf
+from resemblyzer import VoiceEncoder, preprocess_wav
+from sklearn.cluster import AgglomerativeClustering
+
+import structlog
+
+log = structlog.get_logger()
+
+
+@dataclass
+class SpeakerSegment:
+    """A segment attributed to a speaker."""
+    start: float
+    end: float
+    speaker_id: str
+    speaker_label: str  # e.g., "Speaker 1"
+    confidence: Optional[float] = None
+
+
+class SpeakerDiarizer:
+    """Speaker diarization using voice embeddings."""
+
+    def __init__(
+        self,
+        min_segment_duration: float = 0.5,
+        max_speakers: int = 10,
+        embedding_step: float = 0.5  # Step size for embeddings in seconds
+    ):
+        self.min_segment_duration = min_segment_duration
+        self.max_speakers = max_speakers
+        self.embedding_step = embedding_step
+
+        # Load voice encoder (this downloads the model on first use)
+        log.info("Loading voice encoder model...")
+        self.encoder = VoiceEncoder()
+        log.info("Voice encoder loaded")
+
+    def diarize(
+        self,
+        audio_path: str,
+        num_speakers: Optional[int] = None,
+        transcript_segments: Optional[List[dict]] = None
+    ) -> List[SpeakerSegment]:
+        """
+        Perform speaker diarization on an audio file.
+
+        Args:
+            audio_path: Path to audio file (WAV, 16kHz mono)
+            num_speakers: Number of speakers (if known), otherwise auto-detected
+            transcript_segments: Optional transcript segments to align with
+
+        Returns:
+            List of SpeakerSegment with speaker attributions
+        """
+        if not os.path.exists(audio_path):
+            raise FileNotFoundError(f"Audio file not found: {audio_path}")
+
+        log.info("Starting speaker diarization", audio_path=audio_path)
+
+        # Load and preprocess audio
+        wav, sample_rate = sf.read(audio_path)
+
+        if sample_rate != 16000:
+            log.warning(f"Audio sample rate is {sample_rate}, expected 16000")
+
+        # Ensure mono
+        if len(wav.shape) > 1:
+            wav = wav.mean(axis=1)
+
+        # Preprocess for resemblyzer
+        wav = preprocess_wav(wav)
+
+        if len(wav) == 0:
+            log.warning("Audio file is empty after preprocessing")
+            return []
+
+        # Generate embeddings for sliding windows
+        embeddings, timestamps = self._generate_embeddings(wav, sample_rate)
+
+        if len(embeddings) == 0:
+            log.warning("No embeddings generated")
+            return []
+
+        # Cluster embeddings to identify speakers
+        speaker_labels = self._cluster_speakers(
+            embeddings,
+            num_speakers=num_speakers
+        )
+
+        # Convert to speaker segments
+        segments = self._create_segments(timestamps, speaker_labels)
+
+        # If transcript segments provided, align them
+        if transcript_segments:
+            segments = self._align_with_transcript(segments, transcript_segments)
+
+        log.info(
+            "Diarization complete",
+            num_segments=len(segments),
+            num_speakers=len(set(s.speaker_id for s in segments))
+        )
+
+        return segments
+
+    def _generate_embeddings(
+        self,
+        wav: np.ndarray,
+        sample_rate: int
+    ) -> Tuple[np.ndarray, List[float]]:
+        """Generate voice embeddings for sliding windows."""
+        embeddings = []
+        timestamps = []
+
+        # Window size in samples (1.5 seconds for good speaker representation)
+        window_size = int(1.5 * sample_rate)
+        step_size = int(self.embedding_step * sample_rate)
+
+        # Slide through audio
+        for start_sample in range(0, len(wav) - window_size, step_size):
+            end_sample = start_sample + window_size
+            window = wav[start_sample:end_sample]
+
+            # Get embedding for this window
+            try:
+                embedding = self.encoder.embed_utterance(window)
+                embeddings.append(embedding)
+                timestamps.append(start_sample / sample_rate)
+            except Exception as e:
+                log.debug(f"Failed to embed window at {start_sample/sample_rate}s: {e}")
+                continue
+
+        return np.array(embeddings), timestamps
+
+    def _cluster_speakers(
+        self,
+        embeddings: np.ndarray,
+        num_speakers: Optional[int] = None
+    ) -> np.ndarray:
+        """Cluster embeddings to identify speakers."""
+        if len(embeddings) == 0:
+            return np.array([])
+
+        # If number of speakers not specified, estimate it
+        if num_speakers is None:
+            num_speakers = self._estimate_num_speakers(embeddings)
+
+        # Ensure we don't exceed max speakers or embedding count
+        num_speakers = min(num_speakers, self.max_speakers, len(embeddings))
+        num_speakers = max(num_speakers, 1)
+
+        log.info(f"Clustering with {num_speakers} speakers")
+
+        # Use agglomerative clustering
+        clustering = AgglomerativeClustering(
+            n_clusters=num_speakers,
+            metric="cosine",
+            linkage="average"
+        )
+
+        labels = clustering.fit_predict(embeddings)
+
+        return labels
+
+    def _estimate_num_speakers(self, embeddings: np.ndarray) -> int:
+        """Estimate the number of speakers from embeddings."""
+        if len(embeddings) < 2:
+            return 1
+
+        # Try different numbers of clusters and find the best
+        best_score = -1
+        best_n = 2
+
+        for n in range(2, min(6, len(embeddings))):
+            try:
+                clustering = AgglomerativeClustering(
+                    n_clusters=n,
+                    metric="cosine",
+                    linkage="average"
+                )
+                labels = clustering.fit_predict(embeddings)
+
+                # Calculate silhouette-like score
+                score = self._cluster_quality_score(embeddings, labels)
+
+                if score > best_score:
+                    best_score = score
+                    best_n = n
+            except Exception:
+                continue
+
+        log.info(f"Estimated {best_n} speakers (score: {best_score:.3f})")
+        return best_n
+
+    def _cluster_quality_score(
+        self,
+        embeddings: np.ndarray,
+        labels: np.ndarray
+    ) -> float:
+        """Calculate a simple cluster quality score."""
+        unique_labels = np.unique(labels)
+
+        if len(unique_labels) < 2:
+            return 0.0
+
+        # Calculate average intra-cluster distance
+        intra_distances = []
+        for label in unique_labels:
+            cluster_embeddings = embeddings[labels == label]
+            if len(cluster_embeddings) > 1:
+                # Cosine distance within cluster
+                for i in range(len(cluster_embeddings)):
+                    for j in range(i + 1, len(cluster_embeddings)):
+                        dist = 1 - np.dot(cluster_embeddings[i], cluster_embeddings[j])
+                        intra_distances.append(dist)
+
+        if not intra_distances:
+            return 0.0
+
+        avg_intra = np.mean(intra_distances)
+
+        # Calculate average inter-cluster distance
+        inter_distances = []
+        cluster_centers = []
+        for label in unique_labels:
+            cluster_embeddings = embeddings[labels == label]
+            center = cluster_embeddings.mean(axis=0)
+            cluster_centers.append(center)
+
+        for i in range(len(cluster_centers)):
+            for j in range(i + 1, len(cluster_centers)):
+                dist = 1 - np.dot(cluster_centers[i], cluster_centers[j])
+                inter_distances.append(dist)
+
+        avg_inter = np.mean(inter_distances) if inter_distances else 1.0
+
+        # Score: higher inter-cluster distance, lower intra-cluster distance is better
+        return (avg_inter - avg_intra) / max(avg_inter, avg_intra, 0.001)
+
+    def _create_segments(
+        self,
+        timestamps: List[float],
+        labels: np.ndarray
+    ) -> List[SpeakerSegment]:
+        """Convert clustered timestamps to speaker segments."""
+        if len(timestamps) == 0:
+            return []
+
+        segments = []
+        current_speaker = labels[0]
+        segment_start = timestamps[0]
+
+        for i in range(1, len(timestamps)):
+            if labels[i] != current_speaker:
+                # End current segment
+                segment_end = timestamps[i]
+
+                if segment_end - segment_start >= self.min_segment_duration:
+                    segments.append(SpeakerSegment(
+                        start=segment_start,
+                        end=segment_end,
+                        speaker_id=f"speaker_{current_speaker}",
+                        speaker_label=f"Speaker {current_speaker + 1}"
+                    ))
+
+                # Start new segment
+                current_speaker = labels[i]
+                segment_start = timestamps[i]
+
+        # Add final segment
+        if len(timestamps) > 0:
+            segment_end = timestamps[-1] + self.embedding_step
+            if segment_end - segment_start >= self.min_segment_duration:
+                segments.append(SpeakerSegment(
+                    start=segment_start,
+                    end=segment_end,
+                    speaker_id=f"speaker_{current_speaker}",
+                    speaker_label=f"Speaker {current_speaker + 1}"
+                ))
+
+        return segments
+
+    def _align_with_transcript(
+        self,
+        speaker_segments: List[SpeakerSegment],
+        transcript_segments: List[dict]
+    ) -> List[SpeakerSegment]:
+        """Align speaker segments with transcript segments."""
+        aligned = []
+
+        for trans in transcript_segments:
+            trans_start = trans.get("start", 0)
+            trans_end = trans.get("end", 0)
+            trans_mid = (trans_start + trans_end) / 2
+
+            # Find the speaker segment that best overlaps
+            best_speaker = None
+            best_overlap = 0
+
+            for speaker in speaker_segments:
+                # Calculate overlap
+                overlap_start = max(trans_start, speaker.start)
+                overlap_end = min(trans_end, speaker.end)
+                overlap = max(0, overlap_end - overlap_start)
+
+                if overlap > best_overlap:
+                    best_overlap = overlap
+                    best_speaker = speaker
+
+            if best_speaker:
+                aligned.append(SpeakerSegment(
+                    start=trans_start,
+                    end=trans_end,
+                    speaker_id=best_speaker.speaker_id,
+                    speaker_label=best_speaker.speaker_label,
+                    confidence=best_overlap / (trans_end - trans_start) if trans_end > trans_start else 0
+                ))
+            else:
+                # No match, assign unknown speaker
+                aligned.append(SpeakerSegment(
+                    start=trans_start,
+                    end=trans_end,
+                    speaker_id="speaker_unknown",
+                    speaker_label="Unknown Speaker",
+                    confidence=0
+                ))
+
+        return aligned
--- a/deploy/meeting-intelligence/transcriber/app/main.py
+++ b/deploy/meeting-intelligence/transcriber/app/main.py
@ -0,0 +1,274 @@
+"""
+Meeting Intelligence Transcription Service
+
+FastAPI service that handles:
+- Audio extraction from video recordings
+- Transcription using whisper.cpp
+- Speaker diarization using resemblyzer
+- Job queue management via Redis
+"""
+
+import asyncio
+import os
+from contextlib import asynccontextmanager
+from typing import Optional
+
+from fastapi import FastAPI, BackgroundTasks, HTTPException
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel
+from redis import Redis
+from rq import Queue
+
+from .config import settings
+from .transcriber import WhisperTranscriber
+from .diarizer import SpeakerDiarizer
+from .processor import JobProcessor
+from .database import Database
+
+import structlog
+
+log = structlog.get_logger()
+
+
+# Pydantic models
+class TranscribeRequest(BaseModel):
+    meeting_id: str
+    audio_path: str
+    priority: int = 5
+    enable_diarization: bool = True
+    language: Optional[str] = None
+
+
+class TranscribeResponse(BaseModel):
+    job_id: str
+    status: str
+    message: str
+
+
+class JobStatus(BaseModel):
+    job_id: str
+    status: str
+    progress: Optional[float] = None
+    result: Optional[dict] = None
+    error: Optional[str] = None
+
+
+# Application state
+class AppState:
+    redis: Optional[Redis] = None
+    queue: Optional[Queue] = None
+    db: Optional[Database] = None
+    transcriber: Optional[WhisperTranscriber] = None
+    diarizer: Optional[SpeakerDiarizer] = None
+    processor: Optional[JobProcessor] = None
+
+
+state = AppState()
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Application startup and shutdown."""
+    log.info("Starting transcription service...")
+
+    # Initialize Redis connection
+    state.redis = Redis.from_url(settings.redis_url)
+    state.queue = Queue("transcription", connection=state.redis)
+
+    # Initialize database
+    state.db = Database(settings.postgres_url)
+    await state.db.connect()
+
+    # Initialize transcriber
+    state.transcriber = WhisperTranscriber(
+        model_path=settings.whisper_model,
+        threads=settings.whisper_threads
+    )
+
+    # Initialize diarizer
+    state.diarizer = SpeakerDiarizer()
+
+    # Initialize job processor
+    state.processor = JobProcessor(
+        transcriber=state.transcriber,
+        diarizer=state.diarizer,
+        db=state.db,
+        redis=state.redis
+    )
+
+    # Start background worker
+    asyncio.create_task(state.processor.process_jobs())
+
+    log.info("Transcription service started successfully")
+
+    yield
+
+    # Shutdown
+    log.info("Shutting down transcription service...")
+    if state.processor:
+        await state.processor.stop()
+    if state.db:
+        await state.db.disconnect()
+    if state.redis:
+        state.redis.close()
+
+    log.info("Transcription service stopped")
+
+
+app = FastAPI(
+    title="Meeting Intelligence Transcription Service",
+    description="Transcription and speaker diarization for meeting recordings",
+    version="1.0.0",
+    lifespan=lifespan
+)
+
+
+@app.get("/health")
+async def health_check():
+    """Health check endpoint."""
+    redis_ok = False
+    db_ok = False
+
+    try:
+        if state.redis:
+            state.redis.ping()
+            redis_ok = True
+    except Exception as e:
+        log.error("Redis health check failed", error=str(e))
+
+    try:
+        if state.db:
+            await state.db.health_check()
+            db_ok = True
+    except Exception as e:
+        log.error("Database health check failed", error=str(e))
+
+    status = "healthy" if (redis_ok and db_ok) else "unhealthy"
+
+    return {
+        "status": status,
+        "redis": redis_ok,
+        "database": db_ok,
+        "whisper_model": settings.whisper_model,
+        "threads": settings.whisper_threads
+    }
+
+
+@app.get("/status")
+async def service_status():
+    """Get service status and queue info."""
+    queue_length = state.queue.count if state.queue else 0
+    processing = state.processor.active_jobs if state.processor else 0
+
+    return {
+        "status": "running",
+        "queue_length": queue_length,
+        "active_jobs": processing,
+        "workers": settings.num_workers,
+        "model": os.path.basename(settings.whisper_model)
+    }
+
+
+@app.post("/transcribe", response_model=TranscribeResponse)
+async def queue_transcription(request: TranscribeRequest, background_tasks: BackgroundTasks):
+    """Queue a transcription job."""
+    log.info(
+        "Received transcription request",
+        meeting_id=request.meeting_id,
+        audio_path=request.audio_path
+    )
+
+    # Validate audio file exists
+    if not os.path.exists(request.audio_path):
+        raise HTTPException(
+            status_code=404,
+            detail=f"Audio file not found: {request.audio_path}"
+        )
+
+    # Create job record in database
+    try:
+        job_id = await state.db.create_transcription_job(
+            meeting_id=request.meeting_id,
+            audio_path=request.audio_path,
+            enable_diarization=request.enable_diarization,
+            language=request.language,
+            priority=request.priority
+        )
+    except Exception as e:
+        log.error("Failed to create job", error=str(e))
+        raise HTTPException(status_code=500, detail=str(e))
+
+    # Queue the job
+    state.queue.enqueue(
+        "app.worker.process_transcription",
+        job_id,
+        job_timeout="2h",
+        result_ttl=86400  # 24 hours
+    )
+
+    log.info("Job queued", job_id=job_id)
+
+    return TranscribeResponse(
+        job_id=job_id,
+        status="queued",
+        message="Transcription job queued successfully"
+    )
+
+
+@app.get("/transcribe/{job_id}", response_model=JobStatus)
+async def get_job_status(job_id: str):
+    """Get the status of a transcription job."""
+    job = await state.db.get_job(job_id)
+
+    if not job:
+        raise HTTPException(status_code=404, detail="Job not found")
+
+    return JobStatus(
+        job_id=job_id,
+        status=job["status"],
+        progress=job.get("progress"),
+        result=job.get("result"),
+        error=job.get("error_message")
+    )
+
+
+@app.delete("/transcribe/{job_id}")
+async def cancel_job(job_id: str):
+    """Cancel a pending transcription job."""
+    job = await state.db.get_job(job_id)
+
+    if not job:
+        raise HTTPException(status_code=404, detail="Job not found")
+
+    if job["status"] not in ["pending", "queued"]:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Cannot cancel job in status: {job['status']}"
+        )
+
+    await state.db.update_job_status(job_id, "cancelled")
+
+    return {"status": "cancelled", "job_id": job_id}
+
+
+@app.get("/meetings/{meeting_id}/transcript")
+async def get_transcript(meeting_id: str):
+    """Get the transcript for a meeting."""
+    transcript = await state.db.get_transcript(meeting_id)
+
+    if not transcript:
+        raise HTTPException(
+            status_code=404,
+            detail=f"No transcript found for meeting: {meeting_id}"
+        )
+
+    return {
+        "meeting_id": meeting_id,
+        "segments": transcript,
+        "segment_count": len(transcript)
+    }
+
+
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8001)
--- a/deploy/meeting-intelligence/transcriber/app/processor.py
+++ b/deploy/meeting-intelligence/transcriber/app/processor.py
@ -0,0 +1,282 @@
+"""
+Job Processor for the Transcription Service.
+
+Handles the processing pipeline:
+1. Audio extraction from video
+2. Transcription
+3. Speaker diarization
+4. Database storage
+"""
+
+import asyncio
+import os
+import subprocess
+from typing import Optional
+
+import structlog
+
+from .config import settings
+from .transcriber import WhisperTranscriber, TranscriptionResult
+from .diarizer import SpeakerDiarizer, SpeakerSegment
+from .database import Database
+
+log = structlog.get_logger()
+
+
+class JobProcessor:
+    """Processes transcription jobs from the queue."""
+
+    def __init__(
+        self,
+        transcriber: WhisperTranscriber,
+        diarizer: SpeakerDiarizer,
+        db: Database,
+        redis
+    ):
+        self.transcriber = transcriber
+        self.diarizer = diarizer
+        self.db = db
+        self.redis = redis
+        self.active_jobs = 0
+        self._running = False
+        self._workers = []
+
+    async def process_jobs(self):
+        """Main job processing loop."""
+        self._running = True
+        log.info("Job processor started", num_workers=settings.num_workers)
+
+        # Start worker tasks
+        for i in range(settings.num_workers):
+            worker = asyncio.create_task(self._worker(i))
+            self._workers.append(worker)
+
+        # Wait for all workers
+        await asyncio.gather(*self._workers, return_exceptions=True)
+
+    async def stop(self):
+        """Stop the job processor."""
+        self._running = False
+        for worker in self._workers:
+            worker.cancel()
+        log.info("Job processor stopped")
+
+    async def _worker(self, worker_id: int):
+        """Worker that processes individual jobs."""
+        log.info(f"Worker {worker_id} started")
+
+        while self._running:
+            try:
+                # Get next job from database
+                job = await self.db.get_next_pending_job()
+
+                if job is None:
+                    # No jobs, wait a bit
+                    await asyncio.sleep(2)
+                    continue
+
+                job_id = job["id"]
+                meeting_id = job["meeting_id"]
+
+                log.info(
+                    f"Worker {worker_id} processing job",
+                    job_id=job_id,
+                    meeting_id=meeting_id
+                )
+
+                self.active_jobs += 1
+
+                try:
+                    await self._process_job(job)
+                except Exception as e:
+                    log.error(
+                        "Job processing failed",
+                        job_id=job_id,
+                        error=str(e)
+                    )
+                    await self.db.update_job_status(
+                        job_id,
+                        "failed",
+                        error_message=str(e)
+                    )
+                finally:
+                    self.active_jobs -= 1
+
+            except asyncio.CancelledError:
+                break
+            except Exception as e:
+                log.error(f"Worker {worker_id} error", error=str(e))
+                await asyncio.sleep(5)
+
+        log.info(f"Worker {worker_id} stopped")
+
+    async def _process_job(self, job: dict):
+        """Process a single transcription job."""
+        job_id = job["id"]
+        meeting_id = job["meeting_id"]
+        audio_path = job.get("audio_path")
+        video_path = job.get("video_path")
+        enable_diarization = job.get("enable_diarization", True)
+        language = job.get("language")
+
+        # Update status to processing
+        await self.db.update_job_status(job_id, "processing")
+        await self.db.update_meeting_status(meeting_id, "transcribing")
+
+        # Step 1: Extract audio if we have video
+        if video_path and not audio_path:
+            log.info("Extracting audio from video", video_path=video_path)
+            await self.db.update_job_status(job_id, "processing", progress=0.1)
+
+            audio_path = await self._extract_audio(video_path, meeting_id)
+            await self.db.update_job_audio_path(job_id, audio_path)
+
+        if not audio_path or not os.path.exists(audio_path):
+            raise RuntimeError(f"Audio file not found: {audio_path}")
+
+        # Step 2: Transcribe
+        log.info("Starting transcription", audio_path=audio_path)
+        await self.db.update_job_status(job_id, "processing", progress=0.3)
+
+        transcription = await asyncio.get_event_loop().run_in_executor(
+            None,
+            lambda: self.transcriber.transcribe(audio_path, language)
+        )
+
+        log.info(
+            "Transcription complete",
+            segments=len(transcription.segments),
+            duration=transcription.duration
+        )
+
+        # Step 3: Speaker diarization
+        speaker_segments = []
+        if enable_diarization and len(transcription.segments) > 0:
+            log.info("Starting speaker diarization")
+            await self.db.update_job_status(job_id, "processing", progress=0.6)
+            await self.db.update_meeting_status(meeting_id, "diarizing")
+
+            # Convert transcript segments to dicts for diarizer
+            transcript_dicts = [
+                {"start": s.start, "end": s.end, "text": s.text}
+                for s in transcription.segments
+            ]
+
+            speaker_segments = await asyncio.get_event_loop().run_in_executor(
+                None,
+                lambda: self.diarizer.diarize(
+                    audio_path,
+                    transcript_segments=transcript_dicts
+                )
+            )
+
+            log.info(
+                "Diarization complete",
+                num_segments=len(speaker_segments),
+                num_speakers=len(set(s.speaker_id for s in speaker_segments))
+            )
+
+        # Step 4: Store results
+        log.info("Storing transcript in database")
+        await self.db.update_job_status(job_id, "processing", progress=0.9)
+
+        await self._store_transcript(
+            meeting_id,
+            transcription,
+            speaker_segments
+        )
+
+        # Mark job complete
+        await self.db.update_job_status(
+            job_id,
+            "completed",
+            result={
+                "segments": len(transcription.segments),
+                "duration": transcription.duration,
+                "language": transcription.language,
+                "speakers": len(set(s.speaker_id for s in speaker_segments)) if speaker_segments else 0
+            }
+        )
+
+        # Update meeting status - ready for summarization
+        await self.db.update_meeting_status(meeting_id, "summarizing")
+
+        log.info("Job completed successfully", job_id=job_id)
+
+    async def _extract_audio(self, video_path: str, meeting_id: str) -> str:
+        """Extract audio from video file using ffmpeg."""
+        output_dir = os.path.join(settings.audio_output_path, meeting_id)
+        os.makedirs(output_dir, exist_ok=True)
+
+        audio_path = os.path.join(output_dir, "audio.wav")
+
+        cmd = [
+            "ffmpeg",
+            "-i", video_path,
+            "-vn",  # No video
+            "-acodec", "pcm_s16le",  # PCM 16-bit
+            "-ar", str(settings.audio_sample_rate),  # Sample rate
+            "-ac", str(settings.audio_channels),  # Mono
+            "-y",  # Overwrite
+            audio_path
+        ]
+
+        log.debug("Running ffmpeg", cmd=" ".join(cmd))
+
+        process = await asyncio.create_subprocess_exec(
+            *cmd,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE
+        )
+
+        _, stderr = await process.communicate()
+
+        if process.returncode != 0:
+            raise RuntimeError(f"FFmpeg failed: {stderr.decode()}")
+
+        log.info("Audio extracted", output=audio_path)
+        return audio_path
+
+    async def _store_transcript(
+        self,
+        meeting_id: str,
+        transcription: TranscriptionResult,
+        speaker_segments: list
+    ):
+        """Store transcript segments in database."""
+        # Create a map from time ranges to speakers
+        speaker_map = {}
+        for seg in speaker_segments:
+            speaker_map[(seg.start, seg.end)] = (seg.speaker_id, seg.speaker_label)
+
+        # Store each transcript segment
+        for i, segment in enumerate(transcription.segments):
+            # Find matching speaker
+            speaker_id = None
+            speaker_label = None
+
+            for (start, end), (sid, slabel) in speaker_map.items():
+                if segment.start >= start and segment.end <= end:
+                    speaker_id = sid
+                    speaker_label = slabel
+                    break
+
+            # If no exact match, find closest overlap
+            if speaker_id is None:
+                for seg in speaker_segments:
+                    if segment.start < seg.end and segment.end > seg.start:
+                        speaker_id = seg.speaker_id
+                        speaker_label = seg.speaker_label
+                        break
+
+            await self.db.insert_transcript_segment(
+                meeting_id=meeting_id,
+                segment_index=i,
+                start_time=segment.start,
+                end_time=segment.end,
+                text=segment.text,
+                speaker_id=speaker_id,
+                speaker_label=speaker_label,
+                confidence=segment.confidence,
+                language=transcription.language
+            )
--- a/deploy/meeting-intelligence/transcriber/app/transcriber.py
+++ b/deploy/meeting-intelligence/transcriber/app/transcriber.py
@ -0,0 +1,211 @@
+"""
+Whisper.cpp transcription wrapper.
+
+Uses the whisper CLI to transcribe audio files.
+"""
+
+import json
+import os
+import subprocess
+import tempfile
+from dataclasses import dataclass
+from typing import List, Optional
+
+import structlog
+
+log = structlog.get_logger()
+
+
+@dataclass
+class TranscriptSegment:
+    """A single transcript segment."""
+    start: float
+    end: float
+    text: str
+    confidence: Optional[float] = None
+
+
+@dataclass
+class TranscriptionResult:
+    """Result of a transcription job."""
+    segments: List[TranscriptSegment]
+    language: str
+    duration: float
+    text: str
+
+
+class WhisperTranscriber:
+    """Wrapper for whisper.cpp transcription."""
+
+    def __init__(
+        self,
+        model_path: str = "/models/ggml-small.bin",
+        threads: int = 8,
+        language: str = "en"
+    ):
+        self.model_path = model_path
+        self.threads = threads
+        self.language = language
+        self.whisper_bin = "/usr/local/bin/whisper"
+
+        # Verify whisper binary exists
+        if not os.path.exists(self.whisper_bin):
+            raise RuntimeError(f"Whisper binary not found at {self.whisper_bin}")
+
+        # Verify model exists
+        if not os.path.exists(model_path):
+            raise RuntimeError(f"Whisper model not found at {model_path}")
+
+        log.info(
+            "WhisperTranscriber initialized",
+            model=model_path,
+            threads=threads,
+            language=language
+        )
+
+    def transcribe(
+        self,
+        audio_path: str,
+        language: Optional[str] = None,
+        translate: bool = False
+    ) -> TranscriptionResult:
+        """
+        Transcribe an audio file.
+
+        Args:
+            audio_path: Path to the audio file (WAV format, 16kHz mono)
+            language: Language code (e.g., 'en', 'es', 'fr') or None for auto-detect
+            translate: If True, translate to English
+
+        Returns:
+            TranscriptionResult with segments and full text
+        """
+        if not os.path.exists(audio_path):
+            raise FileNotFoundError(f"Audio file not found: {audio_path}")
+
+        log.info("Starting transcription", audio_path=audio_path, language=language)
+
+        # Create temp file for JSON output
+        with tempfile.NamedTemporaryFile(suffix=".json", delete=False) as tmp:
+            output_json = tmp.name
+
+        try:
+            # Build whisper command
+            cmd = [
+                self.whisper_bin,
+                "-m", self.model_path,
+                "-f", audio_path,
+                "-t", str(self.threads),
+                "-oj",  # Output JSON
+                "-of", output_json.replace(".json", ""),  # Output file prefix
+                "--print-progress",
+            ]
+
+            # Add language if specified
+            if language:
+                cmd.extend(["-l", language])
+            else:
+                cmd.extend(["-l", self.language])
+
+            # Add translate flag if needed
+            if translate:
+                cmd.append("--translate")
+
+            log.debug("Running whisper command", cmd=" ".join(cmd))
+
+            # Run whisper
+            result = subprocess.run(
+                cmd,
+                capture_output=True,
+                text=True,
+                timeout=7200  # 2 hour timeout
+            )
+
+            if result.returncode != 0:
+                log.error(
+                    "Whisper transcription failed",
+                    returncode=result.returncode,
+                    stderr=result.stderr
+                )
+                raise RuntimeError(f"Whisper failed: {result.stderr}")
+
+            # Parse JSON output
+            with open(output_json, "r") as f:
+                whisper_output = json.load(f)
+
+            # Extract segments
+            segments = []
+            full_text_parts = []
+
+            for item in whisper_output.get("transcription", []):
+                segment = TranscriptSegment(
+                    start=item["offsets"]["from"] / 1000.0,  # Convert ms to seconds
+                    end=item["offsets"]["to"] / 1000.0,
+                    text=item["text"].strip(),
+                    confidence=item.get("confidence")
+                )
+                segments.append(segment)
+                full_text_parts.append(segment.text)
+
+            # Get detected language
+            detected_language = whisper_output.get("result", {}).get("language", language or self.language)
+
+            # Calculate total duration
+            duration = segments[-1].end if segments else 0.0
+
+            log.info(
+                "Transcription complete",
+                segments=len(segments),
+                duration=duration,
+                language=detected_language
+            )
+
+            return TranscriptionResult(
+                segments=segments,
+                language=detected_language,
+                duration=duration,
+                text=" ".join(full_text_parts)
+            )
+
+        finally:
+            # Clean up temp files
+            for ext in [".json", ".txt", ".vtt", ".srt"]:
+                tmp_file = output_json.replace(".json", ext)
+                if os.path.exists(tmp_file):
+                    os.remove(tmp_file)
+
+    def transcribe_with_timestamps(
+        self,
+        audio_path: str,
+        language: Optional[str] = None
+    ) -> List[dict]:
+        """
+        Transcribe with word-level timestamps.
+
+        Returns list of dicts with word, start, end, confidence.
+        """
+        result = self.transcribe(audio_path, language)
+
+        # Convert segments to word-level format
+        # Note: whisper.cpp provides segment-level timestamps by default
+        # For true word-level, we'd need the --max-len 1 flag but it's slower
+
+        words = []
+        for segment in result.segments:
+            # Estimate word timestamps within segment
+            segment_words = segment.text.split()
+            if not segment_words:
+                continue
+
+            duration = segment.end - segment.start
+            word_duration = duration / len(segment_words)
+
+            for i, word in enumerate(segment_words):
+                words.append({
+                    "word": word,
+                    "start": segment.start + (i * word_duration),
+                    "end": segment.start + ((i + 1) * word_duration),
+                    "confidence": segment.confidence
+                })
+
+        return words
--- a/deploy/meeting-intelligence/transcriber/requirements.txt
+++ b/deploy/meeting-intelligence/transcriber/requirements.txt
@ -0,0 +1,41 @@
+# Transcription Service Dependencies
+
+# Web framework
+fastapi==0.109.2
+uvicorn[standard]==0.27.1
+python-multipart==0.0.9
+
+# Job queue
+redis==5.0.1
+rq==1.16.0
+
+# Database
+asyncpg==0.29.0
+sqlalchemy[asyncio]==2.0.25
+psycopg2-binary==2.9.9
+
+# Audio processing
+pydub==0.25.1
+soundfile==0.12.1
+librosa==0.10.1
+numpy==1.26.4
+
+# Speaker diarization
+resemblyzer==0.1.3
+torch==2.2.0
+torchaudio==2.2.0
+scipy==1.12.0
+scikit-learn==1.4.0
+
+# Sentence embeddings (for semantic search)
+sentence-transformers==2.3.1
+
+# Utilities
+pydantic==2.6.1
+pydantic-settings==2.1.0
+python-dotenv==1.0.1
+httpx==0.26.0
+tenacity==8.2.3
+
+# Logging & monitoring
+structlog==24.1.0
				`@ -0,0 +1 @@`
				`# Meeting Intelligence Transcription Service`