feat(meeting-intelligence): add backend infrastructure for transcription and AI summaries

Add complete Meeting Intelligence System infrastructure:

Backend Services:
- PostgreSQL schema with pgvector for semantic search
- Transcription service using whisper.cpp and resemblyzer for diarization
- Meeting Intelligence API with FastAPI
- Jibri configuration for recording

API Endpoints:
- /meetings - List, get, delete meetings
- /meetings/{id}/transcript - Get transcripts with speaker attribution
- /meetings/{id}/summary - Generate AI summaries via Ollama
- /search - Full-text and semantic search
- /meetings/{id}/export - Export as PDF, Markdown, JSON
- /webhooks/recording-complete - Jibri callback

Features:
- Zero-cost local transcription (whisper.cpp CPU)
- Speaker diarization (who said what)
- AI-powered summaries with key points, action items, decisions
- Vector embeddings for semantic search
- Multi-format export

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Jeff Emmett 2026-02-05 19:04:19 +00:00
parent f56986818b
commit 4cb219db0f
27 changed files with 4017 additions and 0 deletions

View File

@ -0,0 +1,17 @@
# Meeting Intelligence System - Environment Variables
# Copy this file to .env and update values
# PostgreSQL
POSTGRES_PASSWORD=your-secure-password-here
# API Security
API_SECRET_KEY=your-api-secret-key-here
# Jibri XMPP Configuration
XMPP_SERVER=meet.jeffemmett.com
XMPP_DOMAIN=meet.jeffemmett.com
JIBRI_XMPP_PASSWORD=jibri-xmpp-password
JIBRI_RECORDER_PASSWORD=recorder-password
# Ollama (uses host.docker.internal by default)
# OLLAMA_URL=http://host.docker.internal:11434

View File

@ -0,0 +1,151 @@
# Meeting Intelligence System
A fully self-hosted, zero-cost meeting intelligence system for Jeffsi Meet that provides:
- Automatic meeting recording via Jibri
- Local transcription via whisper.cpp (CPU-only)
- Speaker diarization (who said what)
- AI-powered summaries via Ollama
- Searchable meeting archive with dashboard
## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ Netcup RS 8000 (Backend) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Jibri │───▶│ Whisper │───▶│ AI Processor │ │
│ │ Recording │ │ Transcriber │ │ (Ollama + Summarizer) │ │
│ │ Container │ │ Service │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL + pgvector │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Components
| Service | Port | Description |
|---------|------|-------------|
| PostgreSQL | 5432 | Database with pgvector for semantic search |
| Redis | 6379 | Job queue for async processing |
| Transcriber | 8001 | whisper.cpp + speaker diarization |
| API | 8000 | REST API for meetings, transcripts, search |
| Jibri | - | Recording service (joins meetings as hidden participant) |
## Deployment
### Prerequisites
1. Docker and Docker Compose installed
2. Ollama running on the host (for AI summaries)
3. Jeffsi Meet configured with recording enabled
### Setup
1. Copy environment file:
```bash
cp .env.example .env
```
2. Edit `.env` with your configuration:
```bash
vim .env
```
3. Create storage directories:
```bash
sudo mkdir -p /opt/meetings/{recordings,audio}
sudo chown -R 1000:1000 /opt/meetings
```
4. Start services:
```bash
docker compose up -d
```
5. Check logs:
```bash
docker compose logs -f
```
## API Endpoints
Base URL: `https://meet.jeffemmett.com/api/intelligence`
### Meetings
- `GET /meetings` - List all meetings
- `GET /meetings/{id}` - Get meeting details
- `DELETE /meetings/{id}` - Delete meeting
### Transcripts
- `GET /meetings/{id}/transcript` - Get full transcript
- `GET /meetings/{id}/transcript/text` - Get as plain text
- `GET /meetings/{id}/speakers` - Get speaker statistics
### Summaries
- `GET /meetings/{id}/summary` - Get AI summary
- `POST /meetings/{id}/summary` - Generate summary
### Search
- `POST /search` - Search transcripts (text + semantic)
- `GET /search/suggest` - Get search suggestions
### Export
- `GET /meetings/{id}/export?format=markdown` - Export as Markdown
- `GET /meetings/{id}/export?format=json` - Export as JSON
- `GET /meetings/{id}/export?format=pdf` - Export as PDF
### Webhooks
- `POST /webhooks/recording-complete` - Jibri recording callback
## Processing Pipeline
1. **Recording** - Jibri joins meeting and records
2. **Webhook** - Jibri calls `/webhooks/recording-complete`
3. **Audio Extraction** - FFmpeg extracts audio from video
4. **Transcription** - whisper.cpp transcribes audio
5. **Diarization** - resemblyzer identifies speakers
6. **Embedding** - Generate vector embeddings for search
7. **Summary** - Ollama generates AI summary
8. **Ready** - Meeting available in dashboard
## Resource Usage
| Service | CPU | RAM | Storage |
|---------|-----|-----|---------|
| Transcriber | 8 cores | 12GB | 5GB (models) |
| API | 1 core | 2GB | - |
| PostgreSQL | 2 cores | 4GB | ~50GB |
| Jibri | 2 cores | 4GB | - |
| Redis | 0.5 cores | 512MB | - |
## Troubleshooting
### Transcription is slow
- Check CPU usage: `docker stats meeting-intelligence-transcriber`
- Increase `WHISPER_THREADS` in docker-compose.yml
- Consider using the `tiny` model for faster (less accurate) transcription
### No summary generated
- Check Ollama is running: `curl http://localhost:11434/api/tags`
- Check logs: `docker compose logs api`
- Verify model is available: `ollama list`
### Recording not starting
- Check Jibri logs: `docker compose logs jibri`
- Verify XMPP credentials in `.env`
- Check Prosody recorder virtual host configuration
## Cost Analysis
| Component | Monthly Cost |
|-----------|-------------|
| Jibri recording | $0 (local) |
| Whisper transcription | $0 (local CPU) |
| Ollama summarization | $0 (local) |
| PostgreSQL | $0 (local) |
| **Total** | **$0/month** |

View File

@ -0,0 +1,32 @@
# Meeting Intelligence API
# Provides REST API for meeting transcripts, summaries, and search
FROM python:3.11-slim
# Install dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY app/ ./app/
# Create directories
RUN mkdir -p /recordings /logs
# Environment variables
ENV PYTHONUNBUFFERED=1
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Run the service
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

View File

@ -0,0 +1 @@
# Meeting Intelligence API

View File

@ -0,0 +1,50 @@
"""
Configuration settings for the Meeting Intelligence API.
"""
from typing import List
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
"""Application settings loaded from environment variables."""
# Database
postgres_url: str = "postgresql://meeting_intelligence:changeme@localhost:5432/meeting_intelligence"
# Redis
redis_url: str = "redis://localhost:6379"
# Ollama (for AI summaries)
ollama_url: str = "http://localhost:11434"
ollama_model: str = "llama3.2"
# File paths
recordings_path: str = "/recordings"
# Security
secret_key: str = "changeme"
api_key: str = "" # Optional API key authentication
# CORS
cors_origins: List[str] = [
"https://meet.jeffemmett.com",
"http://localhost:8080",
"http://localhost:3000"
]
# Embeddings model for semantic search
embedding_model: str = "all-MiniLM-L6-v2"
# Export settings
export_temp_dir: str = "/tmp/exports"
# Transcriber service URL
transcriber_url: str = "http://transcriber:8001"
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
settings = Settings()

View File

@ -0,0 +1,355 @@
"""
Database operations for the Meeting Intelligence API.
"""
import uuid
from datetime import datetime
from typing import Optional, List, Dict, Any
import asyncpg
import structlog
log = structlog.get_logger()
class Database:
"""Database operations for Meeting Intelligence API."""
def __init__(self, connection_string: str):
self.connection_string = connection_string
self.pool: Optional[asyncpg.Pool] = None
async def connect(self):
"""Establish database connection pool."""
log.info("Connecting to database...")
self.pool = await asyncpg.create_pool(
self.connection_string,
min_size=2,
max_size=20
)
log.info("Database connected")
async def disconnect(self):
"""Close database connection pool."""
if self.pool:
await self.pool.close()
log.info("Database disconnected")
async def health_check(self):
"""Check database connectivity."""
async with self.pool.acquire() as conn:
await conn.fetchval("SELECT 1")
# ==================== Meetings ====================
async def list_meetings(
self,
limit: int = 50,
offset: int = 0,
status: Optional[str] = None
) -> List[Dict[str, Any]]:
"""List meetings with pagination."""
async with self.pool.acquire() as conn:
if status:
rows = await conn.fetch("""
SELECT id, conference_id, conference_name, title,
started_at, ended_at, duration_seconds,
status, created_at
FROM meetings
WHERE status = $1
ORDER BY created_at DESC
LIMIT $2 OFFSET $3
""", status, limit, offset)
else:
rows = await conn.fetch("""
SELECT id, conference_id, conference_name, title,
started_at, ended_at, duration_seconds,
status, created_at
FROM meetings
ORDER BY created_at DESC
LIMIT $1 OFFSET $2
""", limit, offset)
return [dict(row) for row in rows]
async def get_meeting(self, meeting_id: str) -> Optional[Dict[str, Any]]:
"""Get meeting details."""
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
SELECT m.id, m.conference_id, m.conference_name, m.title,
m.started_at, m.ended_at, m.duration_seconds,
m.recording_path, m.audio_path, m.status,
m.metadata, m.created_at,
(SELECT COUNT(*) FROM transcripts WHERE meeting_id = m.id) as segment_count,
(SELECT COUNT(*) FROM meeting_participants WHERE meeting_id = m.id) as participant_count,
(SELECT id FROM summaries WHERE meeting_id = m.id LIMIT 1) as summary_id
FROM meetings m
WHERE m.id = $1::uuid
""", meeting_id)
if row:
return dict(row)
return None
async def create_meeting(
self,
conference_id: str,
conference_name: Optional[str] = None,
title: Optional[str] = None,
recording_path: Optional[str] = None,
started_at: Optional[datetime] = None,
metadata: Optional[dict] = None
) -> str:
"""Create a new meeting record."""
meeting_id = str(uuid.uuid4())
async with self.pool.acquire() as conn:
await conn.execute("""
INSERT INTO meetings (
id, conference_id, conference_name, title,
recording_path, started_at, status, metadata
)
VALUES ($1, $2, $3, $4, $5, $6, 'recording', $7)
""", meeting_id, conference_id, conference_name, title,
recording_path, started_at or datetime.utcnow(), metadata or {})
return meeting_id
async def update_meeting(
self,
meeting_id: str,
**kwargs
):
"""Update meeting fields."""
if not kwargs:
return
set_clauses = []
values = []
i = 1
for key, value in kwargs.items():
if key in ['status', 'title', 'ended_at', 'duration_seconds',
'recording_path', 'audio_path', 'error_message']:
set_clauses.append(f"{key} = ${i}")
values.append(value)
i += 1
if not set_clauses:
return
values.append(meeting_id)
async with self.pool.acquire() as conn:
await conn.execute(f"""
UPDATE meetings
SET {', '.join(set_clauses)}, updated_at = NOW()
WHERE id = ${i}::uuid
""", *values)
# ==================== Transcripts ====================
async def get_transcript(
self,
meeting_id: str,
speaker_filter: Optional[str] = None
) -> List[Dict[str, Any]]:
"""Get transcript segments for a meeting."""
async with self.pool.acquire() as conn:
if speaker_filter:
rows = await conn.fetch("""
SELECT id, segment_index, start_time, end_time,
speaker_id, speaker_name, speaker_label,
text, confidence, language
FROM transcripts
WHERE meeting_id = $1::uuid AND speaker_id = $2
ORDER BY segment_index ASC
""", meeting_id, speaker_filter)
else:
rows = await conn.fetch("""
SELECT id, segment_index, start_time, end_time,
speaker_id, speaker_name, speaker_label,
text, confidence, language
FROM transcripts
WHERE meeting_id = $1::uuid
ORDER BY segment_index ASC
""", meeting_id)
return [dict(row) for row in rows]
async def get_speakers(self, meeting_id: str) -> List[Dict[str, Any]]:
"""Get speaker statistics for a meeting."""
async with self.pool.acquire() as conn:
rows = await conn.fetch("""
SELECT speaker_id, speaker_label,
COUNT(*) as segment_count,
SUM(end_time - start_time) as speaking_time,
SUM(LENGTH(text)) as character_count
FROM transcripts
WHERE meeting_id = $1::uuid AND speaker_id IS NOT NULL
GROUP BY speaker_id, speaker_label
ORDER BY speaking_time DESC
""", meeting_id)
return [dict(row) for row in rows]
# ==================== Summaries ====================
async def get_summary(self, meeting_id: str) -> Optional[Dict[str, Any]]:
"""Get AI summary for a meeting."""
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
SELECT id, meeting_id, summary_text, key_points,
action_items, decisions, topics, sentiment,
model_used, generated_at
FROM summaries
WHERE meeting_id = $1::uuid
ORDER BY generated_at DESC
LIMIT 1
""", meeting_id)
if row:
return dict(row)
return None
async def save_summary(
self,
meeting_id: str,
summary_text: str,
key_points: List[str],
action_items: List[dict],
decisions: List[str],
topics: List[dict],
sentiment: str,
model_used: str,
prompt_tokens: int = 0,
completion_tokens: int = 0
) -> int:
"""Save AI-generated summary."""
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
INSERT INTO summaries (
meeting_id, summary_text, key_points, action_items,
decisions, topics, sentiment, model_used,
prompt_tokens, completion_tokens
)
VALUES ($1::uuid, $2, $3, $4, $5, $6, $7, $8, $9, $10)
RETURNING id
""", meeting_id, summary_text, key_points, action_items,
decisions, topics, sentiment, model_used,
prompt_tokens, completion_tokens)
return row["id"]
# ==================== Search ====================
async def fulltext_search(
self,
query: str,
meeting_id: Optional[str] = None,
limit: int = 50
) -> List[Dict[str, Any]]:
"""Full-text search across transcripts."""
async with self.pool.acquire() as conn:
if meeting_id:
rows = await conn.fetch("""
SELECT t.id, t.meeting_id, t.start_time, t.end_time,
t.speaker_label, t.text, m.title as meeting_title,
ts_rank(to_tsvector('english', t.text),
plainto_tsquery('english', $1)) as rank
FROM transcripts t
JOIN meetings m ON t.meeting_id = m.id
WHERE t.meeting_id = $2::uuid
AND to_tsvector('english', t.text) @@ plainto_tsquery('english', $1)
ORDER BY rank DESC
LIMIT $3
""", query, meeting_id, limit)
else:
rows = await conn.fetch("""
SELECT t.id, t.meeting_id, t.start_time, t.end_time,
t.speaker_label, t.text, m.title as meeting_title,
ts_rank(to_tsvector('english', t.text),
plainto_tsquery('english', $1)) as rank
FROM transcripts t
JOIN meetings m ON t.meeting_id = m.id
WHERE to_tsvector('english', t.text) @@ plainto_tsquery('english', $1)
ORDER BY rank DESC
LIMIT $2
""", query, limit)
return [dict(row) for row in rows]
async def semantic_search(
self,
embedding: List[float],
meeting_id: Optional[str] = None,
threshold: float = 0.7,
limit: int = 20
) -> List[Dict[str, Any]]:
"""Semantic search using vector embeddings."""
async with self.pool.acquire() as conn:
embedding_str = f"[{','.join(map(str, embedding))}]"
if meeting_id:
rows = await conn.fetch("""
SELECT te.transcript_id, te.meeting_id, te.chunk_text,
t.start_time, t.speaker_label, m.title as meeting_title,
1 - (te.embedding <=> $1::vector) as similarity
FROM transcript_embeddings te
JOIN transcripts t ON te.transcript_id = t.id
JOIN meetings m ON te.meeting_id = m.id
WHERE te.meeting_id = $2::uuid
AND 1 - (te.embedding <=> $1::vector) > $3
ORDER BY te.embedding <=> $1::vector
LIMIT $4
""", embedding_str, meeting_id, threshold, limit)
else:
rows = await conn.fetch("""
SELECT te.transcript_id, te.meeting_id, te.chunk_text,
t.start_time, t.speaker_label, m.title as meeting_title,
1 - (te.embedding <=> $1::vector) as similarity
FROM transcript_embeddings te
JOIN transcripts t ON te.transcript_id = t.id
JOIN meetings m ON te.meeting_id = m.id
WHERE 1 - (te.embedding <=> $1::vector) > $2
ORDER BY te.embedding <=> $1::vector
LIMIT $3
""", embedding_str, threshold, limit)
return [dict(row) for row in rows]
# ==================== Webhooks ====================
async def save_webhook_event(
self,
event_type: str,
payload: dict
) -> int:
"""Save a webhook event for processing."""
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
INSERT INTO webhook_events (event_type, payload)
VALUES ($1, $2)
RETURNING id
""", event_type, payload)
return row["id"]
# ==================== Jobs ====================
async def create_job(
self,
meeting_id: str,
job_type: str,
priority: int = 5,
result: Optional[dict] = None
) -> int:
"""Create a processing job."""
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
INSERT INTO processing_jobs (meeting_id, job_type, priority, result)
VALUES ($1::uuid, $2, $3, $4)
RETURNING id
""", meeting_id, job_type, priority, result or {})
return row["id"]

View File

@ -0,0 +1,113 @@
"""
Meeting Intelligence API
Provides REST API for:
- Meeting management
- Transcript retrieval
- AI-powered summaries
- Semantic search
- Export functionality
"""
import os
from contextlib import asynccontextmanager
from typing import Optional
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from .config import settings
from .database import Database
from .routes import meetings, transcripts, summaries, search, webhooks, export
import structlog
log = structlog.get_logger()
# Application state
class AppState:
db: Optional[Database] = None
state = AppState()
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application startup and shutdown."""
log.info("Starting Meeting Intelligence API...")
# Initialize database
state.db = Database(settings.postgres_url)
await state.db.connect()
# Make database available to routes
app.state.db = state.db
log.info("Meeting Intelligence API started successfully")
yield
# Shutdown
log.info("Shutting down Meeting Intelligence API...")
if state.db:
await state.db.disconnect()
log.info("Meeting Intelligence API stopped")
app = FastAPI(
title="Meeting Intelligence API",
description="API for meeting transcripts, summaries, and search",
version="1.0.0",
lifespan=lifespan,
docs_url="/docs",
redoc_url="/redoc"
)
# CORS configuration
app.add_middleware(
CORSMiddleware,
allow_origins=settings.cors_origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Include routers
app.include_router(meetings.router, prefix="/meetings", tags=["Meetings"])
app.include_router(transcripts.router, prefix="/meetings", tags=["Transcripts"])
app.include_router(summaries.router, prefix="/meetings", tags=["Summaries"])
app.include_router(search.router, prefix="/search", tags=["Search"])
app.include_router(webhooks.router, prefix="/webhooks", tags=["Webhooks"])
app.include_router(export.router, prefix="/meetings", tags=["Export"])
@app.get("/health")
async def health_check():
"""Health check endpoint."""
db_ok = False
try:
if state.db:
await state.db.health_check()
db_ok = True
except Exception as e:
log.error("Database health check failed", error=str(e))
return {
"status": "healthy" if db_ok else "unhealthy",
"database": db_ok,
"version": "1.0.0"
}
@app.get("/")
async def root():
"""Root endpoint."""
return {
"service": "Meeting Intelligence API",
"version": "1.0.0",
"docs": "/docs"
}

View File

@ -0,0 +1,2 @@
# API Routes
from . import meetings, transcripts, summaries, search, webhooks, export

View File

@ -0,0 +1,319 @@
"""
Export routes for Meeting Intelligence.
Supports exporting meetings as PDF, Markdown, and JSON.
"""
import io
import json
import os
from datetime import datetime
from typing import Optional
from fastapi import APIRouter, HTTPException, Request, Response
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
import structlog
log = structlog.get_logger()
router = APIRouter()
class ExportRequest(BaseModel):
format: str = "markdown" # "pdf", "markdown", "json"
include_transcript: bool = True
include_summary: bool = True
@router.get("/{meeting_id}/export")
async def export_meeting(
request: Request,
meeting_id: str,
format: str = "markdown",
include_transcript: bool = True,
include_summary: bool = True
):
"""Export meeting data in various formats."""
db = request.app.state.db
# Get meeting data
meeting = await db.get_meeting(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
# Get transcript if requested
transcript = None
if include_transcript:
transcript = await db.get_transcript(meeting_id)
# Get summary if requested
summary = None
if include_summary:
summary = await db.get_summary(meeting_id)
# Export based on format
if format == "json":
return _export_json(meeting, transcript, summary)
elif format == "markdown":
return _export_markdown(meeting, transcript, summary)
elif format == "pdf":
return await _export_pdf(meeting, transcript, summary)
else:
raise HTTPException(
status_code=400,
detail=f"Unsupported format: {format}. Use: json, markdown, pdf"
)
def _export_json(meeting: dict, transcript: list, summary: dict) -> Response:
"""Export as JSON."""
data = {
"meeting": {
"id": str(meeting["id"]),
"conference_id": meeting["conference_id"],
"title": meeting.get("title"),
"started_at": meeting["started_at"].isoformat() if meeting.get("started_at") else None,
"ended_at": meeting["ended_at"].isoformat() if meeting.get("ended_at") else None,
"duration_seconds": meeting.get("duration_seconds"),
"status": meeting["status"]
},
"transcript": [
{
"start_time": s["start_time"],
"end_time": s["end_time"],
"speaker": s.get("speaker_label"),
"text": s["text"]
}
for s in (transcript or [])
] if transcript else None,
"summary": {
"text": summary["summary_text"],
"key_points": summary["key_points"],
"action_items": summary["action_items"],
"decisions": summary["decisions"],
"topics": summary["topics"],
"sentiment": summary.get("sentiment")
} if summary else None,
"exported_at": datetime.utcnow().isoformat()
}
filename = f"meeting-{meeting['conference_id']}-{datetime.utcnow().strftime('%Y%m%d')}.json"
return Response(
content=json.dumps(data, indent=2),
media_type="application/json",
headers={
"Content-Disposition": f'attachment; filename="{filename}"'
}
)
def _export_markdown(meeting: dict, transcript: list, summary: dict) -> Response:
"""Export as Markdown."""
lines = []
# Header
title = meeting.get("title") or f"Meeting: {meeting['conference_id']}"
lines.append(f"# {title}")
lines.append("")
# Metadata
lines.append("## Meeting Details")
lines.append("")
lines.append(f"- **Conference ID:** {meeting['conference_id']}")
if meeting.get("started_at"):
lines.append(f"- **Date:** {meeting['started_at'].strftime('%Y-%m-%d %H:%M UTC')}")
if meeting.get("duration_seconds"):
minutes = meeting["duration_seconds"] // 60
lines.append(f"- **Duration:** {minutes} minutes")
lines.append(f"- **Status:** {meeting['status']}")
lines.append("")
# Summary
if summary:
lines.append("## Summary")
lines.append("")
lines.append(summary["summary_text"])
lines.append("")
# Key Points
if summary.get("key_points"):
lines.append("### Key Points")
lines.append("")
for point in summary["key_points"]:
lines.append(f"- {point}")
lines.append("")
# Action Items
if summary.get("action_items"):
lines.append("### Action Items")
lines.append("")
for item in summary["action_items"]:
task = item.get("task", item) if isinstance(item, dict) else item
assignee = item.get("assignee", "") if isinstance(item, dict) else ""
checkbox = "[ ]"
if assignee:
lines.append(f"- {checkbox} {task} *(Assigned: {assignee})*")
else:
lines.append(f"- {checkbox} {task}")
lines.append("")
# Decisions
if summary.get("decisions"):
lines.append("### Decisions")
lines.append("")
for decision in summary["decisions"]:
lines.append(f"- {decision}")
lines.append("")
# Transcript
if transcript:
lines.append("## Transcript")
lines.append("")
current_speaker = None
for segment in transcript:
speaker = segment.get("speaker_label") or "Speaker"
time_str = _format_time(segment["start_time"])
if speaker != current_speaker:
lines.append("")
lines.append(f"**{speaker}** *({time_str})*")
current_speaker = speaker
lines.append(f"> {segment['text']}")
lines.append("")
# Footer
lines.append("---")
lines.append(f"*Exported on {datetime.utcnow().strftime('%Y-%m-%d %H:%M UTC')} by Meeting Intelligence*")
content = "\n".join(lines)
filename = f"meeting-{meeting['conference_id']}-{datetime.utcnow().strftime('%Y%m%d')}.md"
return Response(
content=content,
media_type="text/markdown",
headers={
"Content-Disposition": f'attachment; filename="{filename}"'
}
)
async def _export_pdf(meeting: dict, transcript: list, summary: dict) -> StreamingResponse:
"""Export as PDF using reportlab."""
try:
from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, ListFlowable, ListItem
except ImportError:
raise HTTPException(
status_code=501,
detail="PDF export requires reportlab. Use markdown or json format."
)
buffer = io.BytesIO()
# Create PDF document
doc = SimpleDocTemplate(
buffer,
pagesize=letter,
rightMargin=72,
leftMargin=72,
topMargin=72,
bottomMargin=72
)
styles = getSampleStyleSheet()
story = []
# Title
title = meeting.get("title") or f"Meeting: {meeting['conference_id']}"
story.append(Paragraph(title, styles['Title']))
story.append(Spacer(1, 12))
# Metadata
story.append(Paragraph("Meeting Details", styles['Heading2']))
if meeting.get("started_at"):
story.append(Paragraph(
f"Date: {meeting['started_at'].strftime('%Y-%m-%d %H:%M UTC')}",
styles['Normal']
))
if meeting.get("duration_seconds"):
minutes = meeting["duration_seconds"] // 60
story.append(Paragraph(f"Duration: {minutes} minutes", styles['Normal']))
story.append(Spacer(1, 12))
# Summary
if summary:
story.append(Paragraph("Summary", styles['Heading2']))
story.append(Paragraph(summary["summary_text"], styles['Normal']))
story.append(Spacer(1, 12))
if summary.get("key_points"):
story.append(Paragraph("Key Points", styles['Heading3']))
for point in summary["key_points"]:
story.append(Paragraph(f"{point}", styles['Normal']))
story.append(Spacer(1, 12))
if summary.get("action_items"):
story.append(Paragraph("Action Items", styles['Heading3']))
for item in summary["action_items"]:
task = item.get("task", item) if isinstance(item, dict) else item
story.append(Paragraph(f"{task}", styles['Normal']))
story.append(Spacer(1, 12))
# Transcript (abbreviated for PDF)
if transcript:
story.append(Paragraph("Transcript", styles['Heading2']))
current_speaker = None
for segment in transcript[:100]: # Limit segments for PDF
speaker = segment.get("speaker_label") or "Speaker"
if speaker != current_speaker:
story.append(Spacer(1, 6))
story.append(Paragraph(
f"<b>{speaker}</b> ({_format_time(segment['start_time'])})",
styles['Normal']
))
current_speaker = speaker
story.append(Paragraph(segment['text'], styles['Normal']))
if len(transcript) > 100:
story.append(Spacer(1, 12))
story.append(Paragraph(
f"[... {len(transcript) - 100} more segments not shown in PDF]",
styles['Normal']
))
# Build PDF
doc.build(story)
buffer.seek(0)
filename = f"meeting-{meeting['conference_id']}-{datetime.utcnow().strftime('%Y%m%d')}.pdf"
return StreamingResponse(
buffer,
media_type="application/pdf",
headers={
"Content-Disposition": f'attachment; filename="{filename}"'
}
)
def _format_time(seconds: float) -> str:
"""Format seconds as HH:MM:SS or MM:SS."""
total_seconds = int(seconds)
hours = total_seconds // 3600
minutes = (total_seconds % 3600) // 60
secs = total_seconds % 60
if hours > 0:
return f"{hours}:{minutes:02d}:{secs:02d}"
return f"{minutes}:{secs:02d}"

View File

@ -0,0 +1,112 @@
"""
Meeting management routes.
"""
from typing import Optional, List
from fastapi import APIRouter, HTTPException, Request, Query
from pydantic import BaseModel
import structlog
log = structlog.get_logger()
router = APIRouter()
class MeetingResponse(BaseModel):
id: str
conference_id: str
conference_name: Optional[str]
title: Optional[str]
started_at: Optional[str]
ended_at: Optional[str]
duration_seconds: Optional[int]
status: str
created_at: str
segment_count: Optional[int] = None
participant_count: Optional[int] = None
has_summary: Optional[bool] = None
class MeetingListResponse(BaseModel):
meetings: List[MeetingResponse]
total: int
limit: int
offset: int
@router.get("", response_model=MeetingListResponse)
async def list_meetings(
request: Request,
limit: int = Query(default=50, le=100),
offset: int = Query(default=0, ge=0),
status: Optional[str] = Query(default=None)
):
"""List all meetings with pagination."""
db = request.app.state.db
meetings = await db.list_meetings(limit=limit, offset=offset, status=status)
return MeetingListResponse(
meetings=[
MeetingResponse(
id=str(m["id"]),
conference_id=m["conference_id"],
conference_name=m.get("conference_name"),
title=m.get("title"),
started_at=m["started_at"].isoformat() if m.get("started_at") else None,
ended_at=m["ended_at"].isoformat() if m.get("ended_at") else None,
duration_seconds=m.get("duration_seconds"),
status=m["status"],
created_at=m["created_at"].isoformat()
)
for m in meetings
],
total=len(meetings), # TODO: Add total count query
limit=limit,
offset=offset
)
@router.get("/{meeting_id}", response_model=MeetingResponse)
async def get_meeting(request: Request, meeting_id: str):
"""Get meeting details."""
db = request.app.state.db
meeting = await db.get_meeting(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
return MeetingResponse(
id=str(meeting["id"]),
conference_id=meeting["conference_id"],
conference_name=meeting.get("conference_name"),
title=meeting.get("title"),
started_at=meeting["started_at"].isoformat() if meeting.get("started_at") else None,
ended_at=meeting["ended_at"].isoformat() if meeting.get("ended_at") else None,
duration_seconds=meeting.get("duration_seconds"),
status=meeting["status"],
created_at=meeting["created_at"].isoformat(),
segment_count=meeting.get("segment_count"),
participant_count=meeting.get("participant_count"),
has_summary=meeting.get("summary_id") is not None
)
@router.delete("/{meeting_id}")
async def delete_meeting(request: Request, meeting_id: str):
"""Delete a meeting and all associated data."""
db = request.app.state.db
meeting = await db.get_meeting(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
# TODO: Implement cascade delete
# For now, just mark as deleted
await db.update_meeting(meeting_id, status="deleted")
return {"status": "deleted", "meeting_id": meeting_id}

View File

@ -0,0 +1,173 @@
"""
Search routes for Meeting Intelligence.
"""
from typing import Optional, List
from fastapi import APIRouter, HTTPException, Request, Query
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer
from ..config import settings
import structlog
log = structlog.get_logger()
router = APIRouter()
# Lazy-load embedding model
_embedding_model = None
def get_embedding_model():
"""Get or initialize the embedding model."""
global _embedding_model
if _embedding_model is None:
log.info("Loading embedding model...", model=settings.embedding_model)
_embedding_model = SentenceTransformer(settings.embedding_model)
log.info("Embedding model loaded")
return _embedding_model
class SearchResult(BaseModel):
meeting_id: str
meeting_title: Optional[str]
text: str
start_time: Optional[float]
speaker_label: Optional[str]
score: float
search_type: str
class SearchResponse(BaseModel):
query: str
results: List[SearchResult]
total: int
search_type: str
class SearchRequest(BaseModel):
query: str
meeting_id: Optional[str] = None
search_type: str = "combined" # "text", "semantic", "combined"
limit: int = 20
@router.post("", response_model=SearchResponse)
async def search_transcripts(request: Request, body: SearchRequest):
"""Search across meeting transcripts.
Search types:
- text: Full-text search using PostgreSQL ts_vector
- semantic: Semantic search using vector embeddings
- combined: Both text and semantic search, merged results
"""
db = request.app.state.db
if not body.query or len(body.query.strip()) < 2:
raise HTTPException(
status_code=400,
detail="Query must be at least 2 characters"
)
results = []
# Full-text search
if body.search_type in ["text", "combined"]:
text_results = await db.fulltext_search(
query=body.query,
meeting_id=body.meeting_id,
limit=body.limit
)
for r in text_results:
results.append(SearchResult(
meeting_id=str(r["meeting_id"]),
meeting_title=r.get("meeting_title"),
text=r["text"],
start_time=r.get("start_time"),
speaker_label=r.get("speaker_label"),
score=float(r["rank"]),
search_type="text"
))
# Semantic search
if body.search_type in ["semantic", "combined"]:
try:
model = get_embedding_model()
query_embedding = model.encode(body.query).tolist()
semantic_results = await db.semantic_search(
embedding=query_embedding,
meeting_id=body.meeting_id,
threshold=0.6,
limit=body.limit
)
for r in semantic_results:
results.append(SearchResult(
meeting_id=str(r["meeting_id"]),
meeting_title=r.get("meeting_title"),
text=r["chunk_text"],
start_time=r.get("start_time"),
speaker_label=r.get("speaker_label"),
score=float(r["similarity"]),
search_type="semantic"
))
except Exception as e:
log.error("Semantic search failed", error=str(e))
if body.search_type == "semantic":
raise HTTPException(
status_code=500,
detail=f"Semantic search failed: {str(e)}"
)
# Deduplicate and sort by score
seen = set()
unique_results = []
for r in sorted(results, key=lambda x: x.score, reverse=True):
key = (r.meeting_id, r.text[:100])
if key not in seen:
seen.add(key)
unique_results.append(r)
return SearchResponse(
query=body.query,
results=unique_results[:body.limit],
total=len(unique_results),
search_type=body.search_type
)
@router.get("/suggest")
async def search_suggestions(
request: Request,
q: str = Query(..., min_length=2)
):
"""Get search suggestions based on partial query."""
db = request.app.state.db
# Simple prefix search on common terms
results = await db.fulltext_search(query=q, limit=5)
# Extract unique phrases
suggestions = []
for r in results:
# Get surrounding context
text = r["text"]
words = text.split()
# Find matching words and get context
for i, word in enumerate(words):
if q.lower() in word.lower():
start = max(0, i - 2)
end = min(len(words), i + 3)
phrase = " ".join(words[start:end])
if phrase not in suggestions:
suggestions.append(phrase)
if len(suggestions) >= 5:
break
return {"suggestions": suggestions}

View File

@ -0,0 +1,251 @@
"""
AI Summary routes.
"""
import json
from typing import Optional, List
import httpx
from fastapi import APIRouter, HTTPException, Request, BackgroundTasks
from pydantic import BaseModel
from ..config import settings
import structlog
log = structlog.get_logger()
router = APIRouter()
class ActionItem(BaseModel):
task: str
assignee: Optional[str] = None
due_date: Optional[str] = None
completed: bool = False
class Topic(BaseModel):
topic: str
duration_seconds: Optional[float] = None
relevance_score: Optional[float] = None
class SummaryResponse(BaseModel):
meeting_id: str
summary_text: str
key_points: List[str]
action_items: List[ActionItem]
decisions: List[str]
topics: List[Topic]
sentiment: Optional[str]
model_used: str
generated_at: str
class GenerateSummaryRequest(BaseModel):
force_regenerate: bool = False
# Summarization prompt template
SUMMARY_PROMPT = """You are analyzing a meeting transcript. Your task is to extract key information and provide a structured summary.
## Meeting Transcript:
{transcript}
## Instructions:
Analyze the transcript and extract the following information. Be concise and accurate.
Respond ONLY with a valid JSON object in this exact format (no markdown, no extra text):
{{
"summary": "A 2-3 sentence overview of what was discussed in the meeting",
"key_points": ["Point 1", "Point 2", "Point 3"],
"action_items": [
{{"task": "Description of task", "assignee": "Person name or null", "due_date": "Date or null"}}
],
"decisions": ["Decision 1", "Decision 2"],
"topics": [
{{"topic": "Topic name", "relevance_score": 0.9}}
],
"sentiment": "positive" or "neutral" or "negative" or "mixed"
}}
Remember:
- key_points: 3-5 most important points discussed
- action_items: Tasks that need to be done, with assignees if mentioned
- decisions: Any decisions or conclusions reached
- topics: Main themes discussed with relevance scores (0-1)
- sentiment: Overall tone of the meeting
"""
@router.get("/{meeting_id}/summary", response_model=SummaryResponse)
async def get_summary(request: Request, meeting_id: str):
"""Get AI-generated summary for a meeting."""
db = request.app.state.db
# Verify meeting exists
meeting = await db.get_meeting(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
summary = await db.get_summary(meeting_id)
if not summary:
raise HTTPException(
status_code=404,
detail="No summary available. Use POST to generate one."
)
return SummaryResponse(
meeting_id=meeting_id,
summary_text=summary["summary_text"],
key_points=summary["key_points"] or [],
action_items=[
ActionItem(**item) for item in (summary["action_items"] or [])
],
decisions=summary["decisions"] or [],
topics=[
Topic(**topic) for topic in (summary["topics"] or [])
],
sentiment=summary.get("sentiment"),
model_used=summary["model_used"],
generated_at=summary["generated_at"].isoformat()
)
@router.post("/{meeting_id}/summary", response_model=SummaryResponse)
async def generate_summary(
request: Request,
meeting_id: str,
body: GenerateSummaryRequest,
background_tasks: BackgroundTasks
):
"""Generate AI summary for a meeting."""
db = request.app.state.db
# Verify meeting exists
meeting = await db.get_meeting(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
# Check if summary already exists
if not body.force_regenerate:
existing = await db.get_summary(meeting_id)
if existing:
raise HTTPException(
status_code=409,
detail="Summary already exists. Set force_regenerate=true to regenerate."
)
# Get transcript
segments = await db.get_transcript(meeting_id)
if not segments:
raise HTTPException(
status_code=400,
detail="No transcript available for summarization"
)
# Format transcript for LLM
transcript_text = _format_transcript(segments)
# Generate summary using Ollama
summary_data = await _generate_summary_with_ollama(transcript_text)
# Save summary
await db.save_summary(
meeting_id=meeting_id,
summary_text=summary_data["summary"],
key_points=summary_data["key_points"],
action_items=summary_data["action_items"],
decisions=summary_data["decisions"],
topics=summary_data["topics"],
sentiment=summary_data["sentiment"],
model_used=settings.ollama_model
)
# Update meeting status
await db.update_meeting(meeting_id, status="ready")
# Get the saved summary
summary = await db.get_summary(meeting_id)
return SummaryResponse(
meeting_id=meeting_id,
summary_text=summary["summary_text"],
key_points=summary["key_points"] or [],
action_items=[
ActionItem(**item) for item in (summary["action_items"] or [])
],
decisions=summary["decisions"] or [],
topics=[
Topic(**topic) for topic in (summary["topics"] or [])
],
sentiment=summary.get("sentiment"),
model_used=summary["model_used"],
generated_at=summary["generated_at"].isoformat()
)
def _format_transcript(segments: list) -> str:
"""Format transcript segments for LLM processing."""
lines = []
current_speaker = None
for s in segments:
speaker = s.get("speaker_label") or "Speaker"
if speaker != current_speaker:
lines.append(f"\n[{speaker}]")
current_speaker = speaker
lines.append(s["text"])
return "\n".join(lines)
async def _generate_summary_with_ollama(transcript: str) -> dict:
"""Generate summary using Ollama."""
prompt = SUMMARY_PROMPT.format(transcript=transcript[:15000]) # Limit context
async with httpx.AsyncClient(timeout=120.0) as client:
try:
response = await client.post(
f"{settings.ollama_url}/api/generate",
json={
"model": settings.ollama_model,
"prompt": prompt,
"stream": False,
"format": "json"
}
)
response.raise_for_status()
result = response.json()
response_text = result.get("response", "")
# Parse JSON from response
summary_data = json.loads(response_text)
# Validate required fields
return {
"summary": summary_data.get("summary", "No summary generated"),
"key_points": summary_data.get("key_points", []),
"action_items": summary_data.get("action_items", []),
"decisions": summary_data.get("decisions", []),
"topics": summary_data.get("topics", []),
"sentiment": summary_data.get("sentiment", "neutral")
}
except httpx.HTTPError as e:
log.error("Ollama request failed", error=str(e))
raise HTTPException(
status_code=503,
detail=f"AI service unavailable: {str(e)}"
)
except json.JSONDecodeError as e:
log.error("Failed to parse Ollama response", error=str(e))
raise HTTPException(
status_code=500,
detail="Failed to parse AI response"
)

View File

@ -0,0 +1,161 @@
"""
Transcript routes.
"""
from typing import Optional, List
from fastapi import APIRouter, HTTPException, Request, Query
from pydantic import BaseModel
import structlog
log = structlog.get_logger()
router = APIRouter()
class TranscriptSegment(BaseModel):
id: int
segment_index: int
start_time: float
end_time: float
speaker_id: Optional[str]
speaker_name: Optional[str]
speaker_label: Optional[str]
text: str
confidence: Optional[float]
language: Optional[str]
class TranscriptResponse(BaseModel):
meeting_id: str
segments: List[TranscriptSegment]
total_segments: int
duration: Optional[float]
class SpeakerStats(BaseModel):
speaker_id: str
speaker_label: Optional[str]
segment_count: int
speaking_time: float
character_count: int
class SpeakersResponse(BaseModel):
meeting_id: str
speakers: List[SpeakerStats]
@router.get("/{meeting_id}/transcript", response_model=TranscriptResponse)
async def get_transcript(
request: Request,
meeting_id: str,
speaker: Optional[str] = Query(default=None, description="Filter by speaker ID")
):
"""Get full transcript for a meeting."""
db = request.app.state.db
# Verify meeting exists
meeting = await db.get_meeting(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
segments = await db.get_transcript(meeting_id, speaker_filter=speaker)
if not segments:
raise HTTPException(
status_code=404,
detail="No transcript available for this meeting"
)
# Calculate duration from last segment
duration = segments[-1]["end_time"] if segments else None
return TranscriptResponse(
meeting_id=meeting_id,
segments=[
TranscriptSegment(
id=s["id"],
segment_index=s["segment_index"],
start_time=s["start_time"],
end_time=s["end_time"],
speaker_id=s.get("speaker_id"),
speaker_name=s.get("speaker_name"),
speaker_label=s.get("speaker_label"),
text=s["text"],
confidence=s.get("confidence"),
language=s.get("language")
)
for s in segments
],
total_segments=len(segments),
duration=duration
)
@router.get("/{meeting_id}/speakers", response_model=SpeakersResponse)
async def get_speakers(request: Request, meeting_id: str):
"""Get speaker statistics for a meeting."""
db = request.app.state.db
# Verify meeting exists
meeting = await db.get_meeting(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
speakers = await db.get_speakers(meeting_id)
return SpeakersResponse(
meeting_id=meeting_id,
speakers=[
SpeakerStats(
speaker_id=s["speaker_id"],
speaker_label=s.get("speaker_label"),
segment_count=s["segment_count"],
speaking_time=float(s["speaking_time"] or 0),
character_count=s["character_count"] or 0
)
for s in speakers
]
)
@router.get("/{meeting_id}/transcript/text")
async def get_transcript_text(request: Request, meeting_id: str):
"""Get transcript as plain text."""
db = request.app.state.db
# Verify meeting exists
meeting = await db.get_meeting(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
segments = await db.get_transcript(meeting_id)
if not segments:
raise HTTPException(
status_code=404,
detail="No transcript available for this meeting"
)
# Format as plain text
lines = []
current_speaker = None
for s in segments:
speaker = s.get("speaker_label") or "Unknown"
if speaker != current_speaker:
lines.append(f"\n{speaker}:")
current_speaker = speaker
lines.append(f" {s['text']}")
text = "\n".join(lines)
return {
"meeting_id": meeting_id,
"text": text,
"format": "plain"
}

View File

@ -0,0 +1,139 @@
"""
Webhook routes for Jibri recording callbacks.
"""
from datetime import datetime
from typing import Optional
import httpx
from fastapi import APIRouter, HTTPException, Request, BackgroundTasks
from pydantic import BaseModel
from ..config import settings
import structlog
log = structlog.get_logger()
router = APIRouter()
class RecordingCompletePayload(BaseModel):
event_type: str
conference_id: str
recording_path: str
recording_dir: Optional[str] = None
file_size_bytes: Optional[int] = None
completed_at: Optional[str] = None
metadata: Optional[dict] = None
class WebhookResponse(BaseModel):
status: str
meeting_id: str
message: str
@router.post("/recording-complete", response_model=WebhookResponse)
async def recording_complete(
request: Request,
payload: RecordingCompletePayload,
background_tasks: BackgroundTasks
):
"""
Webhook called by Jibri when a recording completes.
This triggers the processing pipeline:
1. Create meeting record
2. Queue transcription job
3. (Later) Generate summary
"""
db = request.app.state.db
log.info(
"Recording complete webhook received",
conference_id=payload.conference_id,
recording_path=payload.recording_path
)
# Save webhook event for audit
await db.save_webhook_event(
event_type=payload.event_type,
payload=payload.model_dump()
)
# Create meeting record
meeting_id = await db.create_meeting(
conference_id=payload.conference_id,
conference_name=payload.conference_id, # Use conference_id as name for now
title=f"Meeting - {payload.conference_id}",
recording_path=payload.recording_path,
started_at=datetime.utcnow(), # Will be updated from recording metadata
metadata=payload.metadata or {}
)
log.info("Meeting record created", meeting_id=meeting_id)
# Update meeting status
await db.update_meeting(meeting_id, status="extracting_audio")
# Queue transcription job
job_id = await db.create_job(
meeting_id=meeting_id,
job_type="transcribe",
priority=5,
result={
"video_path": payload.recording_path,
"enable_diarization": True
}
)
log.info("Transcription job queued", job_id=job_id, meeting_id=meeting_id)
# Trigger transcription service asynchronously
background_tasks.add_task(
_notify_transcriber,
meeting_id,
payload.recording_path
)
return WebhookResponse(
status="accepted",
meeting_id=meeting_id,
message="Recording queued for processing"
)
async def _notify_transcriber(meeting_id: str, recording_path: str):
"""Notify the transcription service to start processing."""
try:
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.post(
f"{settings.transcriber_url}/transcribe",
json={
"meeting_id": meeting_id,
"video_path": recording_path,
"enable_diarization": True
}
)
response.raise_for_status()
log.info(
"Transcriber notified",
meeting_id=meeting_id,
response=response.json()
)
except Exception as e:
log.error(
"Failed to notify transcriber",
meeting_id=meeting_id,
error=str(e)
)
# Job is in database, transcriber will pick it up on next poll
@router.post("/test")
async def test_webhook(request: Request):
"""Test endpoint for webhook connectivity."""
body = await request.json()
log.info("Test webhook received", body=body)
return {"status": "ok", "received": body}

View File

@ -0,0 +1,37 @@
# Meeting Intelligence API Dependencies
# Web framework
fastapi==0.109.2
uvicorn[standard]==0.27.1
python-multipart==0.0.9
# Database
asyncpg==0.29.0
sqlalchemy[asyncio]==2.0.25
psycopg2-binary==2.9.9
# Redis
redis==5.0.1
# HTTP client (for Ollama)
httpx==0.26.0
aiohttp==3.9.3
# Validation
pydantic==2.6.1
pydantic-settings==2.1.0
# Sentence embeddings (for semantic search)
sentence-transformers==2.3.1
numpy==1.26.4
# PDF export
reportlab==4.0.8
markdown2==2.4.12
# Utilities
python-dotenv==1.0.1
tenacity==8.2.3
# Logging
structlog==24.1.0

View File

@ -0,0 +1,186 @@
# Meeting Intelligence System - Full Docker Compose
# Deploy on Netcup RS 8000 at /opt/meeting-intelligence/
#
# Components:
# - Jibri (recording)
# - Transcriber (whisper.cpp + diarization)
# - Meeting Intelligence API
# - PostgreSQL (storage)
# - Redis (job queue)
services:
# ============================================================
# PostgreSQL Database
# ============================================================
postgres:
image: pgvector/pgvector:pg16
container_name: meeting-intelligence-db
restart: unless-stopped
environment:
POSTGRES_USER: meeting_intelligence
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
POSTGRES_DB: meeting_intelligence
volumes:
- postgres_data:/var/lib/postgresql/data
- ./postgres/init.sql:/docker-entrypoint-initdb.d/init.sql:ro
healthcheck:
test: ["CMD-SHELL", "pg_isready -U meeting_intelligence"]
interval: 10s
timeout: 5s
retries: 5
networks:
- meeting-intelligence
# ============================================================
# Redis Job Queue
# ============================================================
redis:
image: redis:7-alpine
container_name: meeting-intelligence-redis
restart: unless-stopped
command: redis-server --appendonly yes
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
networks:
- meeting-intelligence
# ============================================================
# Transcription Service (whisper.cpp + diarization)
# ============================================================
transcriber:
build:
context: ./transcriber
dockerfile: Dockerfile
container_name: meeting-intelligence-transcriber
restart: unless-stopped
environment:
REDIS_URL: redis://redis:6379
POSTGRES_URL: postgresql://meeting_intelligence:${POSTGRES_PASSWORD:-changeme}@postgres:5432/meeting_intelligence
WHISPER_MODEL: small
WHISPER_THREADS: 8
NUM_WORKERS: 4
volumes:
- recordings:/recordings:ro
- audio_processed:/audio
- whisper_models:/models
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
deploy:
resources:
limits:
cpus: '12'
memory: 16G
networks:
- meeting-intelligence
# ============================================================
# Meeting Intelligence API
# ============================================================
api:
build:
context: ./api
dockerfile: Dockerfile
container_name: meeting-intelligence-api
restart: unless-stopped
environment:
REDIS_URL: redis://redis:6379
POSTGRES_URL: postgresql://meeting_intelligence:${POSTGRES_PASSWORD:-changeme}@postgres:5432/meeting_intelligence
OLLAMA_URL: http://host.docker.internal:11434
RECORDINGS_PATH: /recordings
SECRET_KEY: ${API_SECRET_KEY:-changeme}
volumes:
- recordings:/recordings
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
labels:
- "traefik.enable=true"
- "traefik.http.routers.meeting-intelligence.rule=Host(`meet.jeffemmett.com`) && PathPrefix(`/api/intelligence`)"
- "traefik.http.services.meeting-intelligence.loadbalancer.server.port=8000"
- "traefik.http.routers.meeting-intelligence.middlewares=strip-intelligence-prefix"
- "traefik.http.middlewares.strip-intelligence-prefix.stripprefix.prefixes=/api/intelligence"
networks:
- meeting-intelligence
- traefik-public
# ============================================================
# Jibri Recording Service
# ============================================================
jibri:
image: jitsi/jibri:stable-9584
container_name: meeting-intelligence-jibri
restart: unless-stopped
privileged: true
environment:
# XMPP Connection
XMPP_SERVER: ${XMPP_SERVER:-meet.jeffemmett.com}
XMPP_DOMAIN: ${XMPP_DOMAIN:-meet.jeffemmett.com}
XMPP_AUTH_DOMAIN: auth.${XMPP_DOMAIN:-meet.jeffemmett.com}
XMPP_INTERNAL_MUC_DOMAIN: internal.auth.${XMPP_DOMAIN:-meet.jeffemmett.com}
XMPP_RECORDER_DOMAIN: recorder.${XMPP_DOMAIN:-meet.jeffemmett.com}
XMPP_MUC_DOMAIN: muc.${XMPP_DOMAIN:-meet.jeffemmett.com}
# Jibri Settings
JIBRI_BREWERY_MUC: JibriBrewery
JIBRI_PENDING_TIMEOUT: 90
JIBRI_RECORDING_DIR: /recordings
JIBRI_FINALIZE_RECORDING_SCRIPT_PATH: /config/finalize.sh
JIBRI_XMPP_USER: jibri
JIBRI_XMPP_PASSWORD: ${JIBRI_XMPP_PASSWORD:-changeme}
JIBRI_RECORDER_USER: recorder
JIBRI_RECORDER_PASSWORD: ${JIBRI_RECORDER_PASSWORD:-changeme}
# Display Settings
DISPLAY: ":0"
CHROMIUM_FLAGS: --use-fake-ui-for-media-stream,--start-maximized,--kiosk,--enabled,--disable-infobars,--autoplay-policy=no-user-gesture-required
# Public URL
PUBLIC_URL: https://${XMPP_DOMAIN:-meet.jeffemmett.com}
# Timezone
TZ: UTC
volumes:
- recordings:/recordings
- ./jibri/config:/config
- /dev/shm:/dev/shm
cap_add:
- SYS_ADMIN
- NET_BIND_SERVICE
security_opt:
- seccomp:unconfined
shm_size: 2gb
networks:
- meeting-intelligence
volumes:
postgres_data:
redis_data:
recordings:
driver: local
driver_opts:
type: none
o: bind
device: /opt/meetings/recordings
audio_processed:
driver: local
driver_opts:
type: none
o: bind
device: /opt/meetings/audio
whisper_models:
networks:
meeting-intelligence:
driver: bridge
traefik-public:
external: true

View File

@ -0,0 +1,104 @@
#!/bin/bash
# Jibri Recording Finalize Script
# Called when Jibri finishes recording a meeting
#
# Arguments:
# $1 - Recording directory path (e.g., /recordings/<conference_id>/<timestamp>)
#
# This script:
# 1. Finds the recording file
# 2. Notifies the Meeting Intelligence API to start processing
set -e
RECORDING_DIR="$1"
API_URL="${MEETING_INTELLIGENCE_API:-http://api:8000}"
LOG_FILE="/var/log/jibri/finalize.log"
log() {
echo "[$(date -Iseconds)] $1" >> "$LOG_FILE"
echo "[$(date -Iseconds)] $1"
}
log "=== Finalize script started ==="
log "Recording directory: $RECORDING_DIR"
# Validate recording directory
if [ -z "$RECORDING_DIR" ] || [ ! -d "$RECORDING_DIR" ]; then
log "ERROR: Invalid recording directory: $RECORDING_DIR"
exit 1
fi
# Find the recording file (MP4 or WebM)
RECORDING_FILE=$(find "$RECORDING_DIR" -type f \( -name "*.mp4" -o -name "*.webm" \) | head -1)
if [ -z "$RECORDING_FILE" ]; then
log "ERROR: No recording file found in $RECORDING_DIR"
exit 1
fi
log "Found recording file: $RECORDING_FILE"
# Get file info
FILE_SIZE=$(stat -c%s "$RECORDING_FILE" 2>/dev/null || echo "0")
log "Recording file size: $FILE_SIZE bytes"
# Extract conference info from path
# Expected format: /recordings/<conference_id>/<timestamp>/recording.mp4
CONFERENCE_ID=$(echo "$RECORDING_DIR" | awk -F'/' '{print $(NF-1)}')
if [ -z "$CONFERENCE_ID" ]; then
CONFERENCE_ID=$(basename "$(dirname "$RECORDING_DIR")")
fi
# Look for metadata file (Jibri sometimes creates this)
METADATA_FILE="$RECORDING_DIR/metadata.json"
if [ -f "$METADATA_FILE" ]; then
log "Found metadata file: $METADATA_FILE"
METADATA=$(cat "$METADATA_FILE")
else
METADATA="{}"
fi
# Prepare webhook payload
PAYLOAD=$(cat <<EOF
{
"event_type": "recording_completed",
"conference_id": "$CONFERENCE_ID",
"recording_path": "$RECORDING_FILE",
"recording_dir": "$RECORDING_DIR",
"file_size_bytes": $FILE_SIZE,
"completed_at": "$(date -Iseconds)",
"metadata": $METADATA
}
EOF
)
log "Sending webhook to $API_URL/webhooks/recording-complete"
log "Payload: $PAYLOAD"
# Send webhook to Meeting Intelligence API
RESPONSE=$(curl -s -w "\n%{http_code}" \
-X POST \
-H "Content-Type: application/json" \
-d "$PAYLOAD" \
"$API_URL/webhooks/recording-complete" 2>&1)
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
BODY=$(echo "$RESPONSE" | head -n -1)
if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "201" ] || [ "$HTTP_CODE" = "202" ]; then
log "SUCCESS: Webhook accepted (HTTP $HTTP_CODE)"
log "Response: $BODY"
else
log "WARNING: Webhook returned HTTP $HTTP_CODE"
log "Response: $BODY"
# Don't fail the script - the recording is still saved
# The API can be retried later
fi
# Optional: Clean up old recordings (keep last 30 days)
# find /recordings -type f -mtime +30 -delete
log "=== Finalize script completed ==="
exit 0

View File

@ -0,0 +1,310 @@
-- Meeting Intelligence System - PostgreSQL Schema
-- Uses pgvector extension for semantic search
-- Enable required extensions
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "vector";
-- ============================================================
-- Meetings Table
-- ============================================================
CREATE TABLE meetings (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
conference_id VARCHAR(255) NOT NULL,
conference_name VARCHAR(255),
title VARCHAR(500),
started_at TIMESTAMP WITH TIME ZONE,
ended_at TIMESTAMP WITH TIME ZONE,
duration_seconds INTEGER,
recording_path VARCHAR(1000),
audio_path VARCHAR(1000),
status VARCHAR(50) DEFAULT 'recording',
-- Status: 'recording', 'extracting_audio', 'transcribing', 'diarizing', 'summarizing', 'ready', 'failed'
error_message TEXT,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_meetings_conference_id ON meetings(conference_id);
CREATE INDEX idx_meetings_status ON meetings(status);
CREATE INDEX idx_meetings_started_at ON meetings(started_at DESC);
CREATE INDEX idx_meetings_created_at ON meetings(created_at DESC);
-- ============================================================
-- Meeting Participants
-- ============================================================
CREATE TABLE meeting_participants (
id SERIAL PRIMARY KEY,
meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
participant_id VARCHAR(255) NOT NULL,
display_name VARCHAR(255),
email VARCHAR(255),
joined_at TIMESTAMP WITH TIME ZONE,
left_at TIMESTAMP WITH TIME ZONE,
duration_seconds INTEGER,
is_moderator BOOLEAN DEFAULT FALSE,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_participants_meeting_id ON meeting_participants(meeting_id);
CREATE INDEX idx_participants_participant_id ON meeting_participants(participant_id);
-- ============================================================
-- Transcripts
-- ============================================================
CREATE TABLE transcripts (
id SERIAL PRIMARY KEY,
meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
segment_index INTEGER NOT NULL,
start_time FLOAT NOT NULL,
end_time FLOAT NOT NULL,
speaker_id VARCHAR(255),
speaker_name VARCHAR(255),
speaker_label VARCHAR(50), -- e.g., "Speaker 1", "Speaker 2"
text TEXT NOT NULL,
confidence FLOAT,
language VARCHAR(10) DEFAULT 'en',
word_timestamps JSONB, -- Array of {word, start, end, confidence}
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_transcripts_meeting_id ON transcripts(meeting_id);
CREATE INDEX idx_transcripts_speaker_id ON transcripts(speaker_id);
CREATE INDEX idx_transcripts_start_time ON transcripts(meeting_id, start_time);
CREATE INDEX idx_transcripts_text_search ON transcripts USING gin(to_tsvector('english', text));
-- ============================================================
-- Transcript Embeddings (for semantic search)
-- ============================================================
CREATE TABLE transcript_embeddings (
id SERIAL PRIMARY KEY,
transcript_id INTEGER NOT NULL REFERENCES transcripts(id) ON DELETE CASCADE,
meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
embedding vector(384), -- all-MiniLM-L6-v2 dimensions
chunk_text TEXT, -- The text chunk this embedding represents
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_embeddings_transcript_id ON transcript_embeddings(transcript_id);
CREATE INDEX idx_embeddings_meeting_id ON transcript_embeddings(meeting_id);
CREATE INDEX idx_embeddings_vector ON transcript_embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- ============================================================
-- AI Summaries
-- ============================================================
CREATE TABLE summaries (
id SERIAL PRIMARY KEY,
meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
summary_text TEXT,
key_points JSONB, -- Array of key point strings
action_items JSONB, -- Array of {task, assignee, due_date, completed}
decisions JSONB, -- Array of decision strings
topics JSONB, -- Array of {topic, duration_seconds, relevance_score}
sentiment VARCHAR(50), -- 'positive', 'neutral', 'negative', 'mixed'
sentiment_scores JSONB, -- {positive: 0.7, neutral: 0.2, negative: 0.1}
participants_summary JSONB, -- {participant_id: {speaking_time, word_count, topics}}
model_used VARCHAR(100),
prompt_tokens INTEGER,
completion_tokens INTEGER,
generated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
metadata JSONB DEFAULT '{}'
);
CREATE INDEX idx_summaries_meeting_id ON summaries(meeting_id);
CREATE INDEX idx_summaries_generated_at ON summaries(generated_at DESC);
-- ============================================================
-- Processing Jobs Queue
-- ============================================================
CREATE TABLE processing_jobs (
id SERIAL PRIMARY KEY,
meeting_id UUID NOT NULL REFERENCES meetings(id) ON DELETE CASCADE,
job_type VARCHAR(50) NOT NULL, -- 'extract_audio', 'transcribe', 'diarize', 'summarize', 'embed'
status VARCHAR(50) DEFAULT 'pending', -- 'pending', 'processing', 'completed', 'failed', 'cancelled'
priority INTEGER DEFAULT 5, -- 1 = highest, 10 = lowest
attempts INTEGER DEFAULT 0,
max_attempts INTEGER DEFAULT 3,
started_at TIMESTAMP WITH TIME ZONE,
completed_at TIMESTAMP WITH TIME ZONE,
error_message TEXT,
result JSONB,
worker_id VARCHAR(100),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_jobs_meeting_id ON processing_jobs(meeting_id);
CREATE INDEX idx_jobs_status ON processing_jobs(status, priority, created_at);
CREATE INDEX idx_jobs_type_status ON processing_jobs(job_type, status);
-- ============================================================
-- Search History (for analytics)
-- ============================================================
CREATE TABLE search_history (
id SERIAL PRIMARY KEY,
user_id VARCHAR(255),
query TEXT NOT NULL,
search_type VARCHAR(50), -- 'text', 'semantic', 'combined'
results_count INTEGER,
meeting_ids UUID[],
filters JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_search_history_created_at ON search_history(created_at DESC);
-- ============================================================
-- Webhook Events (for Jibri callbacks)
-- ============================================================
CREATE TABLE webhook_events (
id SERIAL PRIMARY KEY,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
processed BOOLEAN DEFAULT FALSE,
processed_at TIMESTAMP WITH TIME ZONE,
error_message TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_webhooks_processed ON webhook_events(processed, created_at);
-- ============================================================
-- Functions
-- ============================================================
-- Update timestamp trigger
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER meetings_updated_at
BEFORE UPDATE ON meetings
FOR EACH ROW
EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER jobs_updated_at
BEFORE UPDATE ON processing_jobs
FOR EACH ROW
EXECUTE FUNCTION update_updated_at();
-- Semantic search function
CREATE OR REPLACE FUNCTION semantic_search(
query_embedding vector(384),
match_threshold FLOAT DEFAULT 0.7,
match_count INT DEFAULT 10,
meeting_filter UUID DEFAULT NULL
)
RETURNS TABLE (
transcript_id INT,
meeting_id UUID,
chunk_text TEXT,
similarity FLOAT
) AS $$
BEGIN
RETURN QUERY
SELECT
te.transcript_id,
te.meeting_id,
te.chunk_text,
1 - (te.embedding <=> query_embedding) AS similarity
FROM transcript_embeddings te
WHERE
(meeting_filter IS NULL OR te.meeting_id = meeting_filter)
AND 1 - (te.embedding <=> query_embedding) > match_threshold
ORDER BY te.embedding <=> query_embedding
LIMIT match_count;
END;
$$ LANGUAGE plpgsql;
-- Full-text search function
CREATE OR REPLACE FUNCTION fulltext_search(
search_query TEXT,
meeting_filter UUID DEFAULT NULL,
match_count INT DEFAULT 50
)
RETURNS TABLE (
transcript_id INT,
meeting_id UUID,
text TEXT,
speaker_name VARCHAR,
start_time FLOAT,
rank FLOAT
) AS $$
BEGIN
RETURN QUERY
SELECT
t.id,
t.meeting_id,
t.text,
t.speaker_name,
t.start_time,
ts_rank(to_tsvector('english', t.text), plainto_tsquery('english', search_query)) AS rank
FROM transcripts t
WHERE
(meeting_filter IS NULL OR t.meeting_id = meeting_filter)
AND to_tsvector('english', t.text) @@ plainto_tsquery('english', search_query)
ORDER BY rank DESC
LIMIT match_count;
END;
$$ LANGUAGE plpgsql;
-- ============================================================
-- Views
-- ============================================================
-- Meeting overview with stats
CREATE VIEW meeting_overview AS
SELECT
m.id,
m.conference_id,
m.conference_name,
m.title,
m.started_at,
m.ended_at,
m.duration_seconds,
m.status,
m.recording_path,
COUNT(DISTINCT mp.id) AS participant_count,
COUNT(DISTINCT t.id) AS transcript_segment_count,
COALESCE(SUM(LENGTH(t.text)), 0) AS total_characters,
s.id IS NOT NULL AS has_summary,
m.created_at
FROM meetings m
LEFT JOIN meeting_participants mp ON m.id = mp.meeting_id
LEFT JOIN transcripts t ON m.id = t.meeting_id
LEFT JOIN summaries s ON m.id = s.meeting_id
GROUP BY m.id, s.id;
-- Speaker stats per meeting
CREATE VIEW speaker_stats AS
SELECT
t.meeting_id,
t.speaker_id,
t.speaker_name,
t.speaker_label,
COUNT(*) AS segment_count,
SUM(t.end_time - t.start_time) AS speaking_time_seconds,
SUM(LENGTH(t.text)) AS character_count,
SUM(array_length(regexp_split_to_array(t.text, '\s+'), 1)) AS word_count
FROM transcripts t
GROUP BY t.meeting_id, t.speaker_id, t.speaker_name, t.speaker_label;
-- ============================================================
-- Sample Data (for testing - remove in production)
-- ============================================================
-- INSERT INTO meetings (conference_id, conference_name, title, started_at, status)
-- VALUES ('test-room-123', 'Test Room', 'Test Meeting', NOW() - INTERVAL '1 hour', 'ready');
COMMENT ON TABLE meetings IS 'Stores meeting metadata and processing status';
COMMENT ON TABLE transcripts IS 'Stores time-stamped transcript segments with speaker attribution';
COMMENT ON TABLE summaries IS 'Stores AI-generated meeting summaries and extracted information';
COMMENT ON TABLE transcript_embeddings IS 'Stores vector embeddings for semantic search';
COMMENT ON TABLE processing_jobs IS 'Job queue for async processing tasks';

View File

@ -0,0 +1,67 @@
# Meeting Intelligence Transcription Service
# Uses whisper.cpp for fast CPU-based transcription
# Uses resemblyzer for speaker diarization
FROM python:3.11-slim AS builder
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*
# Build whisper.cpp
WORKDIR /build
RUN git clone https://github.com/ggerganov/whisper.cpp.git && \
cd whisper.cpp && \
cmake -B build -DWHISPER_BUILD_EXAMPLES=ON && \
cmake --build build --config Release -j$(nproc) && \
cp build/bin/whisper-cli /usr/local/bin/whisper && \
cp build/bin/whisper-server /usr/local/bin/whisper-server 2>/dev/null || true
# Download whisper models
WORKDIR /models
RUN cd /build/whisper.cpp && \
bash models/download-ggml-model.sh small && \
mv models/ggml-small.bin /models/
# Production image
FROM python:3.11-slim
# Install runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
libsndfile1 \
&& rm -rf /var/lib/apt/lists/*
# Copy whisper binary and models
COPY --from=builder /usr/local/bin/whisper /usr/local/bin/whisper
COPY --from=builder /models /models
# Set up Python environment
WORKDIR /app
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY app/ ./app/
# Create directories
RUN mkdir -p /recordings /audio /logs
# Environment variables
ENV PYTHONUNBUFFERED=1
ENV WHISPER_MODEL=/models/ggml-small.bin
ENV WHISPER_THREADS=8
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8001/health || exit 1
# Run the service
EXPOSE 8001
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8001", "--workers", "1"]

View File

@ -0,0 +1 @@
# Meeting Intelligence Transcription Service

View File

@ -0,0 +1,45 @@
"""
Configuration settings for the Transcription Service.
"""
import os
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
"""Application settings loaded from environment variables."""
# Redis configuration
redis_url: str = "redis://localhost:6379"
# PostgreSQL configuration
postgres_url: str = "postgresql://meeting_intelligence:changeme@localhost:5432/meeting_intelligence"
# Whisper configuration
whisper_model: str = "/models/ggml-small.bin"
whisper_threads: int = 8
whisper_language: str = "en"
# Worker configuration
num_workers: int = 4
job_timeout: int = 7200 # 2 hours in seconds
# Audio processing
audio_sample_rate: int = 16000
audio_channels: int = 1
# Diarization settings
min_speaker_duration: float = 0.5 # Minimum speaker segment in seconds
max_speakers: int = 10
# Paths
recordings_path: str = "/recordings"
audio_output_path: str = "/audio"
temp_path: str = "/tmp/transcriber"
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
settings = Settings()

View File

@ -0,0 +1,245 @@
"""
Database operations for the Transcription Service.
"""
import uuid
from typing import Optional, List, Dict, Any
import asyncpg
import structlog
log = structlog.get_logger()
class Database:
"""Database operations for transcription service."""
def __init__(self, connection_string: str):
self.connection_string = connection_string
self.pool: Optional[asyncpg.Pool] = None
async def connect(self):
"""Establish database connection pool."""
log.info("Connecting to database...")
self.pool = await asyncpg.create_pool(
self.connection_string,
min_size=2,
max_size=10
)
log.info("Database connected")
async def disconnect(self):
"""Close database connection pool."""
if self.pool:
await self.pool.close()
log.info("Database disconnected")
async def health_check(self):
"""Check database connectivity."""
async with self.pool.acquire() as conn:
await conn.fetchval("SELECT 1")
async def create_transcription_job(
self,
meeting_id: str,
audio_path: Optional[str] = None,
video_path: Optional[str] = None,
enable_diarization: bool = True,
language: Optional[str] = None,
priority: int = 5
) -> str:
"""Create a new transcription job."""
job_id = str(uuid.uuid4())
async with self.pool.acquire() as conn:
await conn.execute("""
INSERT INTO processing_jobs (
id, meeting_id, job_type, status, priority,
result
)
VALUES ($1, $2::uuid, 'transcribe', 'pending', $3, $4)
""", job_id, meeting_id, priority, {
"audio_path": audio_path,
"video_path": video_path,
"enable_diarization": enable_diarization,
"language": language
})
log.info("Created transcription job", job_id=job_id, meeting_id=meeting_id)
return job_id
async def get_job(self, job_id: str) -> Optional[Dict[str, Any]]:
"""Get a job by ID."""
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
SELECT id, meeting_id, job_type, status, priority,
attempts, started_at, completed_at,
error_message, result, created_at
FROM processing_jobs
WHERE id = $1
""", job_id)
if row:
return dict(row)
return None
async def get_next_pending_job(self) -> Optional[Dict[str, Any]]:
"""Get the next pending job and mark it as processing."""
async with self.pool.acquire() as conn:
# Use FOR UPDATE SKIP LOCKED to prevent race conditions
row = await conn.fetchrow("""
UPDATE processing_jobs
SET status = 'processing',
started_at = NOW(),
attempts = attempts + 1
WHERE id = (
SELECT id FROM processing_jobs
WHERE status = 'pending'
AND job_type = 'transcribe'
ORDER BY priority ASC, created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING id, meeting_id, job_type, result
""")
if row:
result = dict(row)
# Merge result JSON into the dict
if result.get("result"):
result.update(result["result"])
return result
return None
async def update_job_status(
self,
job_id: str,
status: str,
error_message: Optional[str] = None,
result: Optional[dict] = None,
progress: Optional[float] = None
):
"""Update job status."""
async with self.pool.acquire() as conn:
if status == "completed":
await conn.execute("""
UPDATE processing_jobs
SET status = $1,
completed_at = NOW(),
error_message = $2,
result = COALESCE($3::jsonb, result)
WHERE id = $4
""", status, error_message, result, job_id)
else:
update_result = result
if progress is not None:
update_result = result or {}
update_result["progress"] = progress
await conn.execute("""
UPDATE processing_jobs
SET status = $1,
error_message = $2,
result = COALESCE($3::jsonb, result)
WHERE id = $4
""", status, error_message, update_result, job_id)
async def update_job_audio_path(self, job_id: str, audio_path: str):
"""Update the audio path for a job."""
async with self.pool.acquire() as conn:
await conn.execute("""
UPDATE processing_jobs
SET result = result || $1::jsonb
WHERE id = $2
""", {"audio_path": audio_path}, job_id)
async def update_meeting_status(self, meeting_id: str, status: str):
"""Update meeting processing status."""
async with self.pool.acquire() as conn:
await conn.execute("""
UPDATE meetings
SET status = $1,
updated_at = NOW()
WHERE id = $2::uuid
""", status, meeting_id)
async def insert_transcript_segment(
self,
meeting_id: str,
segment_index: int,
start_time: float,
end_time: float,
text: str,
speaker_id: Optional[str] = None,
speaker_label: Optional[str] = None,
confidence: Optional[float] = None,
language: str = "en"
):
"""Insert a transcript segment."""
async with self.pool.acquire() as conn:
await conn.execute("""
INSERT INTO transcripts (
meeting_id, segment_index, start_time, end_time,
text, speaker_id, speaker_label, confidence, language
)
VALUES ($1::uuid, $2, $3, $4, $5, $6, $7, $8, $9)
""", meeting_id, segment_index, start_time, end_time,
text, speaker_id, speaker_label, confidence, language)
async def get_transcript(self, meeting_id: str) -> List[Dict[str, Any]]:
"""Get all transcript segments for a meeting."""
async with self.pool.acquire() as conn:
rows = await conn.fetch("""
SELECT id, segment_index, start_time, end_time,
speaker_id, speaker_label, text, confidence, language
FROM transcripts
WHERE meeting_id = $1::uuid
ORDER BY segment_index ASC
""", meeting_id)
return [dict(row) for row in rows]
async def get_meeting(self, meeting_id: str) -> Optional[Dict[str, Any]]:
"""Get meeting details."""
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
SELECT id, conference_id, conference_name, title,
started_at, ended_at, duration_seconds,
recording_path, audio_path, status,
metadata, created_at
FROM meetings
WHERE id = $1::uuid
""", meeting_id)
if row:
return dict(row)
return None
async def create_meeting(
self,
conference_id: str,
conference_name: Optional[str] = None,
title: Optional[str] = None,
recording_path: Optional[str] = None,
metadata: Optional[dict] = None
) -> str:
"""Create a new meeting record."""
meeting_id = str(uuid.uuid4())
async with self.pool.acquire() as conn:
await conn.execute("""
INSERT INTO meetings (
id, conference_id, conference_name, title,
recording_path, status, metadata
)
VALUES ($1, $2, $3, $4, $5, 'recording', $6)
""", meeting_id, conference_id, conference_name, title,
recording_path, metadata or {})
log.info("Created meeting", meeting_id=meeting_id, conference_id=conference_id)
return meeting_id
class DatabaseError(Exception):
"""Database operation error."""
pass

View File

@ -0,0 +1,338 @@
"""
Speaker Diarization using resemblyzer.
Identifies who spoke when in the audio.
"""
import os
from dataclasses import dataclass
from typing import List, Optional, Tuple
import numpy as np
import soundfile as sf
from resemblyzer import VoiceEncoder, preprocess_wav
from sklearn.cluster import AgglomerativeClustering
import structlog
log = structlog.get_logger()
@dataclass
class SpeakerSegment:
"""A segment attributed to a speaker."""
start: float
end: float
speaker_id: str
speaker_label: str # e.g., "Speaker 1"
confidence: Optional[float] = None
class SpeakerDiarizer:
"""Speaker diarization using voice embeddings."""
def __init__(
self,
min_segment_duration: float = 0.5,
max_speakers: int = 10,
embedding_step: float = 0.5 # Step size for embeddings in seconds
):
self.min_segment_duration = min_segment_duration
self.max_speakers = max_speakers
self.embedding_step = embedding_step
# Load voice encoder (this downloads the model on first use)
log.info("Loading voice encoder model...")
self.encoder = VoiceEncoder()
log.info("Voice encoder loaded")
def diarize(
self,
audio_path: str,
num_speakers: Optional[int] = None,
transcript_segments: Optional[List[dict]] = None
) -> List[SpeakerSegment]:
"""
Perform speaker diarization on an audio file.
Args:
audio_path: Path to audio file (WAV, 16kHz mono)
num_speakers: Number of speakers (if known), otherwise auto-detected
transcript_segments: Optional transcript segments to align with
Returns:
List of SpeakerSegment with speaker attributions
"""
if not os.path.exists(audio_path):
raise FileNotFoundError(f"Audio file not found: {audio_path}")
log.info("Starting speaker diarization", audio_path=audio_path)
# Load and preprocess audio
wav, sample_rate = sf.read(audio_path)
if sample_rate != 16000:
log.warning(f"Audio sample rate is {sample_rate}, expected 16000")
# Ensure mono
if len(wav.shape) > 1:
wav = wav.mean(axis=1)
# Preprocess for resemblyzer
wav = preprocess_wav(wav)
if len(wav) == 0:
log.warning("Audio file is empty after preprocessing")
return []
# Generate embeddings for sliding windows
embeddings, timestamps = self._generate_embeddings(wav, sample_rate)
if len(embeddings) == 0:
log.warning("No embeddings generated")
return []
# Cluster embeddings to identify speakers
speaker_labels = self._cluster_speakers(
embeddings,
num_speakers=num_speakers
)
# Convert to speaker segments
segments = self._create_segments(timestamps, speaker_labels)
# If transcript segments provided, align them
if transcript_segments:
segments = self._align_with_transcript(segments, transcript_segments)
log.info(
"Diarization complete",
num_segments=len(segments),
num_speakers=len(set(s.speaker_id for s in segments))
)
return segments
def _generate_embeddings(
self,
wav: np.ndarray,
sample_rate: int
) -> Tuple[np.ndarray, List[float]]:
"""Generate voice embeddings for sliding windows."""
embeddings = []
timestamps = []
# Window size in samples (1.5 seconds for good speaker representation)
window_size = int(1.5 * sample_rate)
step_size = int(self.embedding_step * sample_rate)
# Slide through audio
for start_sample in range(0, len(wav) - window_size, step_size):
end_sample = start_sample + window_size
window = wav[start_sample:end_sample]
# Get embedding for this window
try:
embedding = self.encoder.embed_utterance(window)
embeddings.append(embedding)
timestamps.append(start_sample / sample_rate)
except Exception as e:
log.debug(f"Failed to embed window at {start_sample/sample_rate}s: {e}")
continue
return np.array(embeddings), timestamps
def _cluster_speakers(
self,
embeddings: np.ndarray,
num_speakers: Optional[int] = None
) -> np.ndarray:
"""Cluster embeddings to identify speakers."""
if len(embeddings) == 0:
return np.array([])
# If number of speakers not specified, estimate it
if num_speakers is None:
num_speakers = self._estimate_num_speakers(embeddings)
# Ensure we don't exceed max speakers or embedding count
num_speakers = min(num_speakers, self.max_speakers, len(embeddings))
num_speakers = max(num_speakers, 1)
log.info(f"Clustering with {num_speakers} speakers")
# Use agglomerative clustering
clustering = AgglomerativeClustering(
n_clusters=num_speakers,
metric="cosine",
linkage="average"
)
labels = clustering.fit_predict(embeddings)
return labels
def _estimate_num_speakers(self, embeddings: np.ndarray) -> int:
"""Estimate the number of speakers from embeddings."""
if len(embeddings) < 2:
return 1
# Try different numbers of clusters and find the best
best_score = -1
best_n = 2
for n in range(2, min(6, len(embeddings))):
try:
clustering = AgglomerativeClustering(
n_clusters=n,
metric="cosine",
linkage="average"
)
labels = clustering.fit_predict(embeddings)
# Calculate silhouette-like score
score = self._cluster_quality_score(embeddings, labels)
if score > best_score:
best_score = score
best_n = n
except Exception:
continue
log.info(f"Estimated {best_n} speakers (score: {best_score:.3f})")
return best_n
def _cluster_quality_score(
self,
embeddings: np.ndarray,
labels: np.ndarray
) -> float:
"""Calculate a simple cluster quality score."""
unique_labels = np.unique(labels)
if len(unique_labels) < 2:
return 0.0
# Calculate average intra-cluster distance
intra_distances = []
for label in unique_labels:
cluster_embeddings = embeddings[labels == label]
if len(cluster_embeddings) > 1:
# Cosine distance within cluster
for i in range(len(cluster_embeddings)):
for j in range(i + 1, len(cluster_embeddings)):
dist = 1 - np.dot(cluster_embeddings[i], cluster_embeddings[j])
intra_distances.append(dist)
if not intra_distances:
return 0.0
avg_intra = np.mean(intra_distances)
# Calculate average inter-cluster distance
inter_distances = []
cluster_centers = []
for label in unique_labels:
cluster_embeddings = embeddings[labels == label]
center = cluster_embeddings.mean(axis=0)
cluster_centers.append(center)
for i in range(len(cluster_centers)):
for j in range(i + 1, len(cluster_centers)):
dist = 1 - np.dot(cluster_centers[i], cluster_centers[j])
inter_distances.append(dist)
avg_inter = np.mean(inter_distances) if inter_distances else 1.0
# Score: higher inter-cluster distance, lower intra-cluster distance is better
return (avg_inter - avg_intra) / max(avg_inter, avg_intra, 0.001)
def _create_segments(
self,
timestamps: List[float],
labels: np.ndarray
) -> List[SpeakerSegment]:
"""Convert clustered timestamps to speaker segments."""
if len(timestamps) == 0:
return []
segments = []
current_speaker = labels[0]
segment_start = timestamps[0]
for i in range(1, len(timestamps)):
if labels[i] != current_speaker:
# End current segment
segment_end = timestamps[i]
if segment_end - segment_start >= self.min_segment_duration:
segments.append(SpeakerSegment(
start=segment_start,
end=segment_end,
speaker_id=f"speaker_{current_speaker}",
speaker_label=f"Speaker {current_speaker + 1}"
))
# Start new segment
current_speaker = labels[i]
segment_start = timestamps[i]
# Add final segment
if len(timestamps) > 0:
segment_end = timestamps[-1] + self.embedding_step
if segment_end - segment_start >= self.min_segment_duration:
segments.append(SpeakerSegment(
start=segment_start,
end=segment_end,
speaker_id=f"speaker_{current_speaker}",
speaker_label=f"Speaker {current_speaker + 1}"
))
return segments
def _align_with_transcript(
self,
speaker_segments: List[SpeakerSegment],
transcript_segments: List[dict]
) -> List[SpeakerSegment]:
"""Align speaker segments with transcript segments."""
aligned = []
for trans in transcript_segments:
trans_start = trans.get("start", 0)
trans_end = trans.get("end", 0)
trans_mid = (trans_start + trans_end) / 2
# Find the speaker segment that best overlaps
best_speaker = None
best_overlap = 0
for speaker in speaker_segments:
# Calculate overlap
overlap_start = max(trans_start, speaker.start)
overlap_end = min(trans_end, speaker.end)
overlap = max(0, overlap_end - overlap_start)
if overlap > best_overlap:
best_overlap = overlap
best_speaker = speaker
if best_speaker:
aligned.append(SpeakerSegment(
start=trans_start,
end=trans_end,
speaker_id=best_speaker.speaker_id,
speaker_label=best_speaker.speaker_label,
confidence=best_overlap / (trans_end - trans_start) if trans_end > trans_start else 0
))
else:
# No match, assign unknown speaker
aligned.append(SpeakerSegment(
start=trans_start,
end=trans_end,
speaker_id="speaker_unknown",
speaker_label="Unknown Speaker",
confidence=0
))
return aligned

View File

@ -0,0 +1,274 @@
"""
Meeting Intelligence Transcription Service
FastAPI service that handles:
- Audio extraction from video recordings
- Transcription using whisper.cpp
- Speaker diarization using resemblyzer
- Job queue management via Redis
"""
import asyncio
import os
from contextlib import asynccontextmanager
from typing import Optional
from fastapi import FastAPI, BackgroundTasks, HTTPException
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from redis import Redis
from rq import Queue
from .config import settings
from .transcriber import WhisperTranscriber
from .diarizer import SpeakerDiarizer
from .processor import JobProcessor
from .database import Database
import structlog
log = structlog.get_logger()
# Pydantic models
class TranscribeRequest(BaseModel):
meeting_id: str
audio_path: str
priority: int = 5
enable_diarization: bool = True
language: Optional[str] = None
class TranscribeResponse(BaseModel):
job_id: str
status: str
message: str
class JobStatus(BaseModel):
job_id: str
status: str
progress: Optional[float] = None
result: Optional[dict] = None
error: Optional[str] = None
# Application state
class AppState:
redis: Optional[Redis] = None
queue: Optional[Queue] = None
db: Optional[Database] = None
transcriber: Optional[WhisperTranscriber] = None
diarizer: Optional[SpeakerDiarizer] = None
processor: Optional[JobProcessor] = None
state = AppState()
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application startup and shutdown."""
log.info("Starting transcription service...")
# Initialize Redis connection
state.redis = Redis.from_url(settings.redis_url)
state.queue = Queue("transcription", connection=state.redis)
# Initialize database
state.db = Database(settings.postgres_url)
await state.db.connect()
# Initialize transcriber
state.transcriber = WhisperTranscriber(
model_path=settings.whisper_model,
threads=settings.whisper_threads
)
# Initialize diarizer
state.diarizer = SpeakerDiarizer()
# Initialize job processor
state.processor = JobProcessor(
transcriber=state.transcriber,
diarizer=state.diarizer,
db=state.db,
redis=state.redis
)
# Start background worker
asyncio.create_task(state.processor.process_jobs())
log.info("Transcription service started successfully")
yield
# Shutdown
log.info("Shutting down transcription service...")
if state.processor:
await state.processor.stop()
if state.db:
await state.db.disconnect()
if state.redis:
state.redis.close()
log.info("Transcription service stopped")
app = FastAPI(
title="Meeting Intelligence Transcription Service",
description="Transcription and speaker diarization for meeting recordings",
version="1.0.0",
lifespan=lifespan
)
@app.get("/health")
async def health_check():
"""Health check endpoint."""
redis_ok = False
db_ok = False
try:
if state.redis:
state.redis.ping()
redis_ok = True
except Exception as e:
log.error("Redis health check failed", error=str(e))
try:
if state.db:
await state.db.health_check()
db_ok = True
except Exception as e:
log.error("Database health check failed", error=str(e))
status = "healthy" if (redis_ok and db_ok) else "unhealthy"
return {
"status": status,
"redis": redis_ok,
"database": db_ok,
"whisper_model": settings.whisper_model,
"threads": settings.whisper_threads
}
@app.get("/status")
async def service_status():
"""Get service status and queue info."""
queue_length = state.queue.count if state.queue else 0
processing = state.processor.active_jobs if state.processor else 0
return {
"status": "running",
"queue_length": queue_length,
"active_jobs": processing,
"workers": settings.num_workers,
"model": os.path.basename(settings.whisper_model)
}
@app.post("/transcribe", response_model=TranscribeResponse)
async def queue_transcription(request: TranscribeRequest, background_tasks: BackgroundTasks):
"""Queue a transcription job."""
log.info(
"Received transcription request",
meeting_id=request.meeting_id,
audio_path=request.audio_path
)
# Validate audio file exists
if not os.path.exists(request.audio_path):
raise HTTPException(
status_code=404,
detail=f"Audio file not found: {request.audio_path}"
)
# Create job record in database
try:
job_id = await state.db.create_transcription_job(
meeting_id=request.meeting_id,
audio_path=request.audio_path,
enable_diarization=request.enable_diarization,
language=request.language,
priority=request.priority
)
except Exception as e:
log.error("Failed to create job", error=str(e))
raise HTTPException(status_code=500, detail=str(e))
# Queue the job
state.queue.enqueue(
"app.worker.process_transcription",
job_id,
job_timeout="2h",
result_ttl=86400 # 24 hours
)
log.info("Job queued", job_id=job_id)
return TranscribeResponse(
job_id=job_id,
status="queued",
message="Transcription job queued successfully"
)
@app.get("/transcribe/{job_id}", response_model=JobStatus)
async def get_job_status(job_id: str):
"""Get the status of a transcription job."""
job = await state.db.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
return JobStatus(
job_id=job_id,
status=job["status"],
progress=job.get("progress"),
result=job.get("result"),
error=job.get("error_message")
)
@app.delete("/transcribe/{job_id}")
async def cancel_job(job_id: str):
"""Cancel a pending transcription job."""
job = await state.db.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if job["status"] not in ["pending", "queued"]:
raise HTTPException(
status_code=400,
detail=f"Cannot cancel job in status: {job['status']}"
)
await state.db.update_job_status(job_id, "cancelled")
return {"status": "cancelled", "job_id": job_id}
@app.get("/meetings/{meeting_id}/transcript")
async def get_transcript(meeting_id: str):
"""Get the transcript for a meeting."""
transcript = await state.db.get_transcript(meeting_id)
if not transcript:
raise HTTPException(
status_code=404,
detail=f"No transcript found for meeting: {meeting_id}"
)
return {
"meeting_id": meeting_id,
"segments": transcript,
"segment_count": len(transcript)
}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8001)

View File

@ -0,0 +1,282 @@
"""
Job Processor for the Transcription Service.
Handles the processing pipeline:
1. Audio extraction from video
2. Transcription
3. Speaker diarization
4. Database storage
"""
import asyncio
import os
import subprocess
from typing import Optional
import structlog
from .config import settings
from .transcriber import WhisperTranscriber, TranscriptionResult
from .diarizer import SpeakerDiarizer, SpeakerSegment
from .database import Database
log = structlog.get_logger()
class JobProcessor:
"""Processes transcription jobs from the queue."""
def __init__(
self,
transcriber: WhisperTranscriber,
diarizer: SpeakerDiarizer,
db: Database,
redis
):
self.transcriber = transcriber
self.diarizer = diarizer
self.db = db
self.redis = redis
self.active_jobs = 0
self._running = False
self._workers = []
async def process_jobs(self):
"""Main job processing loop."""
self._running = True
log.info("Job processor started", num_workers=settings.num_workers)
# Start worker tasks
for i in range(settings.num_workers):
worker = asyncio.create_task(self._worker(i))
self._workers.append(worker)
# Wait for all workers
await asyncio.gather(*self._workers, return_exceptions=True)
async def stop(self):
"""Stop the job processor."""
self._running = False
for worker in self._workers:
worker.cancel()
log.info("Job processor stopped")
async def _worker(self, worker_id: int):
"""Worker that processes individual jobs."""
log.info(f"Worker {worker_id} started")
while self._running:
try:
# Get next job from database
job = await self.db.get_next_pending_job()
if job is None:
# No jobs, wait a bit
await asyncio.sleep(2)
continue
job_id = job["id"]
meeting_id = job["meeting_id"]
log.info(
f"Worker {worker_id} processing job",
job_id=job_id,
meeting_id=meeting_id
)
self.active_jobs += 1
try:
await self._process_job(job)
except Exception as e:
log.error(
"Job processing failed",
job_id=job_id,
error=str(e)
)
await self.db.update_job_status(
job_id,
"failed",
error_message=str(e)
)
finally:
self.active_jobs -= 1
except asyncio.CancelledError:
break
except Exception as e:
log.error(f"Worker {worker_id} error", error=str(e))
await asyncio.sleep(5)
log.info(f"Worker {worker_id} stopped")
async def _process_job(self, job: dict):
"""Process a single transcription job."""
job_id = job["id"]
meeting_id = job["meeting_id"]
audio_path = job.get("audio_path")
video_path = job.get("video_path")
enable_diarization = job.get("enable_diarization", True)
language = job.get("language")
# Update status to processing
await self.db.update_job_status(job_id, "processing")
await self.db.update_meeting_status(meeting_id, "transcribing")
# Step 1: Extract audio if we have video
if video_path and not audio_path:
log.info("Extracting audio from video", video_path=video_path)
await self.db.update_job_status(job_id, "processing", progress=0.1)
audio_path = await self._extract_audio(video_path, meeting_id)
await self.db.update_job_audio_path(job_id, audio_path)
if not audio_path or not os.path.exists(audio_path):
raise RuntimeError(f"Audio file not found: {audio_path}")
# Step 2: Transcribe
log.info("Starting transcription", audio_path=audio_path)
await self.db.update_job_status(job_id, "processing", progress=0.3)
transcription = await asyncio.get_event_loop().run_in_executor(
None,
lambda: self.transcriber.transcribe(audio_path, language)
)
log.info(
"Transcription complete",
segments=len(transcription.segments),
duration=transcription.duration
)
# Step 3: Speaker diarization
speaker_segments = []
if enable_diarization and len(transcription.segments) > 0:
log.info("Starting speaker diarization")
await self.db.update_job_status(job_id, "processing", progress=0.6)
await self.db.update_meeting_status(meeting_id, "diarizing")
# Convert transcript segments to dicts for diarizer
transcript_dicts = [
{"start": s.start, "end": s.end, "text": s.text}
for s in transcription.segments
]
speaker_segments = await asyncio.get_event_loop().run_in_executor(
None,
lambda: self.diarizer.diarize(
audio_path,
transcript_segments=transcript_dicts
)
)
log.info(
"Diarization complete",
num_segments=len(speaker_segments),
num_speakers=len(set(s.speaker_id for s in speaker_segments))
)
# Step 4: Store results
log.info("Storing transcript in database")
await self.db.update_job_status(job_id, "processing", progress=0.9)
await self._store_transcript(
meeting_id,
transcription,
speaker_segments
)
# Mark job complete
await self.db.update_job_status(
job_id,
"completed",
result={
"segments": len(transcription.segments),
"duration": transcription.duration,
"language": transcription.language,
"speakers": len(set(s.speaker_id for s in speaker_segments)) if speaker_segments else 0
}
)
# Update meeting status - ready for summarization
await self.db.update_meeting_status(meeting_id, "summarizing")
log.info("Job completed successfully", job_id=job_id)
async def _extract_audio(self, video_path: str, meeting_id: str) -> str:
"""Extract audio from video file using ffmpeg."""
output_dir = os.path.join(settings.audio_output_path, meeting_id)
os.makedirs(output_dir, exist_ok=True)
audio_path = os.path.join(output_dir, "audio.wav")
cmd = [
"ffmpeg",
"-i", video_path,
"-vn", # No video
"-acodec", "pcm_s16le", # PCM 16-bit
"-ar", str(settings.audio_sample_rate), # Sample rate
"-ac", str(settings.audio_channels), # Mono
"-y", # Overwrite
audio_path
]
log.debug("Running ffmpeg", cmd=" ".join(cmd))
process = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
_, stderr = await process.communicate()
if process.returncode != 0:
raise RuntimeError(f"FFmpeg failed: {stderr.decode()}")
log.info("Audio extracted", output=audio_path)
return audio_path
async def _store_transcript(
self,
meeting_id: str,
transcription: TranscriptionResult,
speaker_segments: list
):
"""Store transcript segments in database."""
# Create a map from time ranges to speakers
speaker_map = {}
for seg in speaker_segments:
speaker_map[(seg.start, seg.end)] = (seg.speaker_id, seg.speaker_label)
# Store each transcript segment
for i, segment in enumerate(transcription.segments):
# Find matching speaker
speaker_id = None
speaker_label = None
for (start, end), (sid, slabel) in speaker_map.items():
if segment.start >= start and segment.end <= end:
speaker_id = sid
speaker_label = slabel
break
# If no exact match, find closest overlap
if speaker_id is None:
for seg in speaker_segments:
if segment.start < seg.end and segment.end > seg.start:
speaker_id = seg.speaker_id
speaker_label = seg.speaker_label
break
await self.db.insert_transcript_segment(
meeting_id=meeting_id,
segment_index=i,
start_time=segment.start,
end_time=segment.end,
text=segment.text,
speaker_id=speaker_id,
speaker_label=speaker_label,
confidence=segment.confidence,
language=transcription.language
)

View File

@ -0,0 +1,211 @@
"""
Whisper.cpp transcription wrapper.
Uses the whisper CLI to transcribe audio files.
"""
import json
import os
import subprocess
import tempfile
from dataclasses import dataclass
from typing import List, Optional
import structlog
log = structlog.get_logger()
@dataclass
class TranscriptSegment:
"""A single transcript segment."""
start: float
end: float
text: str
confidence: Optional[float] = None
@dataclass
class TranscriptionResult:
"""Result of a transcription job."""
segments: List[TranscriptSegment]
language: str
duration: float
text: str
class WhisperTranscriber:
"""Wrapper for whisper.cpp transcription."""
def __init__(
self,
model_path: str = "/models/ggml-small.bin",
threads: int = 8,
language: str = "en"
):
self.model_path = model_path
self.threads = threads
self.language = language
self.whisper_bin = "/usr/local/bin/whisper"
# Verify whisper binary exists
if not os.path.exists(self.whisper_bin):
raise RuntimeError(f"Whisper binary not found at {self.whisper_bin}")
# Verify model exists
if not os.path.exists(model_path):
raise RuntimeError(f"Whisper model not found at {model_path}")
log.info(
"WhisperTranscriber initialized",
model=model_path,
threads=threads,
language=language
)
def transcribe(
self,
audio_path: str,
language: Optional[str] = None,
translate: bool = False
) -> TranscriptionResult:
"""
Transcribe an audio file.
Args:
audio_path: Path to the audio file (WAV format, 16kHz mono)
language: Language code (e.g., 'en', 'es', 'fr') or None for auto-detect
translate: If True, translate to English
Returns:
TranscriptionResult with segments and full text
"""
if not os.path.exists(audio_path):
raise FileNotFoundError(f"Audio file not found: {audio_path}")
log.info("Starting transcription", audio_path=audio_path, language=language)
# Create temp file for JSON output
with tempfile.NamedTemporaryFile(suffix=".json", delete=False) as tmp:
output_json = tmp.name
try:
# Build whisper command
cmd = [
self.whisper_bin,
"-m", self.model_path,
"-f", audio_path,
"-t", str(self.threads),
"-oj", # Output JSON
"-of", output_json.replace(".json", ""), # Output file prefix
"--print-progress",
]
# Add language if specified
if language:
cmd.extend(["-l", language])
else:
cmd.extend(["-l", self.language])
# Add translate flag if needed
if translate:
cmd.append("--translate")
log.debug("Running whisper command", cmd=" ".join(cmd))
# Run whisper
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=7200 # 2 hour timeout
)
if result.returncode != 0:
log.error(
"Whisper transcription failed",
returncode=result.returncode,
stderr=result.stderr
)
raise RuntimeError(f"Whisper failed: {result.stderr}")
# Parse JSON output
with open(output_json, "r") as f:
whisper_output = json.load(f)
# Extract segments
segments = []
full_text_parts = []
for item in whisper_output.get("transcription", []):
segment = TranscriptSegment(
start=item["offsets"]["from"] / 1000.0, # Convert ms to seconds
end=item["offsets"]["to"] / 1000.0,
text=item["text"].strip(),
confidence=item.get("confidence")
)
segments.append(segment)
full_text_parts.append(segment.text)
# Get detected language
detected_language = whisper_output.get("result", {}).get("language", language or self.language)
# Calculate total duration
duration = segments[-1].end if segments else 0.0
log.info(
"Transcription complete",
segments=len(segments),
duration=duration,
language=detected_language
)
return TranscriptionResult(
segments=segments,
language=detected_language,
duration=duration,
text=" ".join(full_text_parts)
)
finally:
# Clean up temp files
for ext in [".json", ".txt", ".vtt", ".srt"]:
tmp_file = output_json.replace(".json", ext)
if os.path.exists(tmp_file):
os.remove(tmp_file)
def transcribe_with_timestamps(
self,
audio_path: str,
language: Optional[str] = None
) -> List[dict]:
"""
Transcribe with word-level timestamps.
Returns list of dicts with word, start, end, confidence.
"""
result = self.transcribe(audio_path, language)
# Convert segments to word-level format
# Note: whisper.cpp provides segment-level timestamps by default
# For true word-level, we'd need the --max-len 1 flag but it's slower
words = []
for segment in result.segments:
# Estimate word timestamps within segment
segment_words = segment.text.split()
if not segment_words:
continue
duration = segment.end - segment.start
word_duration = duration / len(segment_words)
for i, word in enumerate(segment_words):
words.append({
"word": word,
"start": segment.start + (i * word_duration),
"end": segment.start + ((i + 1) * word_duration),
"confidence": segment.confidence
})
return words

View File

@ -0,0 +1,41 @@
# Transcription Service Dependencies
# Web framework
fastapi==0.109.2
uvicorn[standard]==0.27.1
python-multipart==0.0.9
# Job queue
redis==5.0.1
rq==1.16.0
# Database
asyncpg==0.29.0
sqlalchemy[asyncio]==2.0.25
psycopg2-binary==2.9.9
# Audio processing
pydub==0.25.1
soundfile==0.12.1
librosa==0.10.1
numpy==1.26.4
# Speaker diarization
resemblyzer==0.1.3
torch==2.2.0
torchaudio==2.2.0
scipy==1.12.0
scikit-learn==1.4.0
# Sentence embeddings (for semantic search)
sentence-transformers==2.3.1
# Utilities
pydantic==2.6.1
pydantic-settings==2.1.0
python-dotenv==1.0.1
httpx==0.26.0
tenacity==8.2.3
# Logging & monitoring
structlog==24.1.0