Initial commit: P2P Wiki AI system

- RAG-based chat with 39k wiki articles (232k chunks) - Article ingress pipeline for processing external URLs - Review queue for AI-generated content - FastAPI backend with web UI - Traefik-ready Docker setup for p2pwiki.jeffemmett.com Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 13:53:29 +01:00 · 2026-01-23 13:53:29 +01:00 · 4ebd90cc64
commit 4ebd90cc64
16 changed files with 26481 additions and 0 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,17 @@
+# P2P Wiki AI Configuration
+
+# Ollama (Local LLM)
+OLLAMA_BASE_URL=http://localhost:11434
+OLLAMA_MODEL=llama3.2
+
+# Claude API (Optional - for higher quality article drafts)
+ANTHROPIC_API_KEY=
+CLAUDE_MODEL=claude-sonnet-4-20250514
+
+# Hybrid Routing
+USE_CLAUDE_FOR_DRAFTS=true
+USE_OLLAMA_FOR_CHAT=true
+
+# Server
+HOST=0.0.0.0
+PORT=8420
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,28 @@
+# Virtual environment
+.venv/
+venv/
+env/
+
+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+dist/
+build/
+
+# Data files (too large for git)
+data/articles.json
+data/chroma/
+data/review_queue/
+xmldump/
+xmldump-2014.tar.gz
+articles/
+articles.tar.gz
+
+# Environment
+.env
+
+# IDE
+.idea/
+.vscode/
+*.swp
--- a/49
+++ b/49
@ -0,0 +1,49 @@
+# P2P Wiki AI - Multi-stage build
+FROM python:3.11-slim as builder
+
+WORKDIR /app
+
+# Install build dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Python dependencies
+COPY pyproject.toml .
+RUN pip install --no-cache-dir build && \
+    pip wheel --no-cache-dir --wheel-dir /wheels .
+
+# Production image
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# Install runtime dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libxml2 \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy wheels and install
+COPY --from=builder /wheels /wheels
+RUN pip install --no-cache-dir /wheels/*.whl && rm -rf /wheels
+
+# Copy application code
+COPY src/ src/
+COPY web/ web/
+
+# Create data directories
+RUN mkdir -p data/chroma data/review_queue
+
+# Environment variables
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONDONTWRITEBYTECODE=1
+
+# Expose port
+EXPOSE 8420
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD python -c "import httpx; httpx.get('http://localhost:8420/health')" || exit 1
+
+# Run the application
+CMD ["python", "-m", "uvicorn", "src.api:app", "--host", "0.0.0.0", "--port", "8420"]
--- a/README.md
+++ b/README.md
@ -0,0 +1,199 @@
+# P2P Wiki AI
+
+AI-augmented system for the P2P Foundation Wiki with two main features:
+
+1. **Conversational Agent** - Ask questions about the 23,000+ wiki articles using RAG (Retrieval Augmented Generation)
+2. **Article Ingress Pipeline** - Drop article URLs to automatically analyze content, find matching wiki articles for citations, and generate draft articles
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    P2P Wiki AI System                           │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ┌─────────────────┐     ┌─────────────────┐                   │
+│  │   Chat (Q&A)    │     │  Ingress Tool   │                   │
+│  │   via RAG       │     │  (URL Drop)     │                   │
+│  └────────┬────────┘     └────────┬────────┘                   │
+│           │                       │                             │
+│           └───────────┬───────────┘                             │
+│                       ▼                                         │
+│           ┌───────────────────────┐                             │
+│           │    FastAPI Backend    │                             │
+│           └───────────┬───────────┘                             │
+│                       │                                         │
+│        ┌──────────────┼──────────────┐                         │
+│        ▼              ▼              ▼                          │
+│  ┌──────────┐  ┌─────────────┐  ┌──────────────┐               │
+│  │ ChromaDB │  │ Ollama/     │  │   Article    │               │
+│  │ (Vector) │  │ Claude      │  │   Scraper    │               │
+│  └──────────┘  └─────────────┘  └──────────────┘               │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Quick Start
+
+### 1. Prerequisites
+
+- Python 3.10+
+- [Ollama](https://ollama.ai) installed locally (or access to a remote Ollama server)
+- Optional: Anthropic API key for Claude (higher quality article drafts)
+
+### 2. Install Dependencies
+
+```bash
+cd /home/jeffe/Github/p2pwiki-content
+pip install -e .
+```
+
+### 3. Parse Wiki Content
+
+Convert the MediaWiki XML dumps to searchable JSON:
+
+```bash
+python -m src.parser
+```
+
+This creates `data/articles.json` with all parsed articles (~23,000 pages).
+
+### 4. Generate Embeddings
+
+Create the vector store for semantic search:
+
+```bash
+python -m src.embeddings
+```
+
+This creates the ChromaDB vector store in `data/chroma/`. Takes a few minutes.
+
+### 5. Configure Environment
+
+```bash
+cp .env.example .env
+# Edit .env with your settings
+```
+
+### 6. Run the Server
+
+```bash
+python -m src.api
+```
+
+Visit http://localhost:8420/ui for the web interface.
+
+## Docker Deployment
+
+For production deployment on the RS 8000:
+
+```bash
+# Build and run
+docker compose up -d --build
+
+# Check logs
+docker compose logs -f
+
+# Access at http://localhost:8420/ui
+# Or via Traefik at https://wiki-ai.jeffemmett.com
+```
+
+## API Endpoints
+
+### Chat
+
+```bash
+# Ask a question
+curl -X POST http://localhost:8420/chat \
+  -H "Content-Type: application/json" \
+  -d '{"query": "What is commons-based peer production?"}'
+```
+
+### Ingress
+
+```bash
+# Process an external article
+curl -X POST http://localhost:8420/ingress \
+  -H "Content-Type: application/json" \
+  -d '{"url": "https://example.com/article-about-cooperatives"}'
+```
+
+### Review Queue
+
+```bash
+# Get all items in review queue
+curl http://localhost:8420/review
+
+# Approve a draft article
+curl -X POST http://localhost:8420/review/action \
+  -H "Content-Type: application/json" \
+  -d '{"filepath": "/path/to/item.json", "item_type": "draft", "item_index": 0, "action": "approve"}'
+```
+
+### Search
+
+```bash
+# Direct vector search
+curl "http://localhost:8420/search?q=cooperative%20economics&n=10"
+
+# List article titles
+curl "http://localhost:8420/articles?limit=100"
+```
+
+## Hybrid AI Routing
+
+The system uses intelligent routing between local (Ollama) and cloud (Claude) LLMs:
+
+| Task | Default LLM | Reasoning |
+|------|-------------|-----------|
+| Chat Q&A | Ollama | Fast, free, good enough for retrieval-based answers |
+| Content Analysis | Claude | Better at extracting topics and identifying wiki relevance |
+| Draft Generation | Claude | Higher quality article writing |
+| Embeddings | Local (sentence-transformers) | Fast, free, optimized for semantic search |
+
+Configure in `.env`:
+```
+USE_CLAUDE_FOR_DRAFTS=true
+USE_OLLAMA_FOR_CHAT=true
+```
+
+## Project Structure
+
+```
+p2pwiki-content/
+├── src/
+│   ├── api.py          # FastAPI backend
+│   ├── config.py       # Configuration settings
+│   ├── embeddings.py   # Vector store (ChromaDB)
+│   ├── ingress.py      # Article scraper & analyzer
+│   ├── llm.py          # LLM client (Ollama/Claude)
+│   ├── parser.py       # MediaWiki XML parser
+│   └── rag.py          # RAG chat system
+├── web/
+│   └── index.html      # Web UI
+├── data/
+│   ├── articles.json   # Parsed wiki content
+│   ├── chroma/         # Vector store
+│   └── review_queue/   # Pending ingress items
+├── xmldump/            # MediaWiki XML dumps
+├── docker-compose.yml
+├── Dockerfile
+└── pyproject.toml
+```
+
+## Content Coverage
+
+The P2P Foundation Wiki contains ~23,000 articles covering:
+
+- Peer-to-peer networks and culture
+- Commons-based peer production (CBPP)
+- Alternative economics and post-capitalism
+- Cooperative business models
+- Open source and free culture
+- Collaborative governance
+- Sustainability and ecology
+
+## License
+
+The wiki content is from the P2P Foundation under their respective licenses.
+The AI system code is provided as-is for educational purposes.
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,38 @@
+version: '3.8'
+
+services:
+  p2pwiki-ai:
+    build: .
+    container_name: p2pwiki-ai
+    restart: unless-stopped
+    ports:
+      - "8420:8420"
+    volumes:
+      # Persist vector store and review queue
+      - ./data:/app/data
+      # Mount XML dumps for parsing (read-only)
+      - ./xmldump:/app/xmldump:ro
+    environment:
+      # Ollama connection (adjust host for your setup)
+      - OLLAMA_BASE_URL=${OLLAMA_BASE_URL:-http://host.docker.internal:11434}
+      - OLLAMA_MODEL=${OLLAMA_MODEL:-llama3.2}
+      # Claude API (optional, for higher quality drafts)
+      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
+      - CLAUDE_MODEL=${CLAUDE_MODEL:-claude-sonnet-4-20250514}
+      # Hybrid routing settings
+      - USE_CLAUDE_FOR_DRAFTS=${USE_CLAUDE_FOR_DRAFTS:-true}
+      - USE_OLLAMA_FOR_CHAT=${USE_OLLAMA_FOR_CHAT:-true}
+    labels:
+      # Traefik labels for reverse proxy
+      - "traefik.enable=true"
+      - "traefik.http.routers.p2pwiki-ai.rule=Host(`p2pwiki.jeffemmett.com`)"
+      - "traefik.http.services.p2pwiki-ai.loadbalancer.server.port=8420"
+    networks:
+      - traefik-public
+    # Add extra_hosts for Docker Desktop to access host services
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+
+networks:
+  traefik-public:
+    external: true
--- a/pagenames.txt
+++ b/pagenames.txt
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,64 @@
+[project]
+name = "p2pwiki-ai"
+version = "0.1.0"
+description = "AI-augmented system for P2P Foundation Wiki - chat agent and ingress pipeline"
+requires-python = ">=3.10"
+dependencies = [
+    # Core
+    "fastapi>=0.109.0",
+    "uvicorn[standard]>=0.27.0",
+    "pydantic>=2.5.0",
+    "pydantic-settings>=2.1.0",
+
+    # XML parsing
+    "lxml>=5.1.0",
+
+    # Vector store & embeddings
+    "chromadb>=0.4.22",
+    "sentence-transformers>=2.3.0",
+
+    # LLM integration
+    "openai>=1.10.0",  # For Ollama-compatible API
+    "anthropic>=0.18.0",  # For Claude API
+    "httpx>=0.26.0",
+
+    # Article scraping
+    "trafilatura>=1.6.0",
+    "newspaper3k>=0.2.8",
+    "beautifulsoup4>=4.12.0",
+    "requests>=2.31.0",
+
+    # Utilities
+    "python-dotenv>=1.0.0",
+    "rich>=13.7.0",
+    "tqdm>=4.66.0",
+    "tenacity>=8.2.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.4.0",
+    "pytest-asyncio>=0.23.0",
+    "black>=24.1.0",
+    "ruff>=0.1.0",
+]
+
+[project.scripts]
+p2pwiki-parse = "src.parser:main"
+p2pwiki-embed = "src.embeddings:main"
+p2pwiki-serve = "src.api:main"
+
+[build-system]
+requires = ["setuptools>=68.0", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[tool.setuptools.packages.find]
+where = ["."]
+
+[tool.black]
+line-length = 100
+target-version = ["py310"]
+
+[tool.ruff]
+line-length = 100
+select = ["E", "F", "I", "N", "W"]
--- a/src/init.py
+++ b/src/init.py
@ -0,0 +1 @@
+"""P2P Wiki AI System - Chat agent and ingress pipeline."""
--- a/src/api.py
+++ b/src/api.py
@ -0,0 +1,320 @@
+"""FastAPI backend for P2P Wiki AI system."""
+
+import asyncio
+from contextlib import asynccontextmanager
+from pathlib import Path
+from typing import Optional
+
+from fastapi import FastAPI, HTTPException, BackgroundTasks
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.staticfiles import StaticFiles
+from fastapi.responses import FileResponse
+from pydantic import BaseModel, HttpUrl
+
+from .config import settings
+from .embeddings import WikiVectorStore
+from .rag import WikiRAG, RAGResponse
+from .ingress import IngressPipeline, get_review_queue, approve_item, reject_item
+
+# Global instances
+vector_store: Optional[WikiVectorStore] = None
+rag_system: Optional[WikiRAG] = None
+ingress_pipeline: Optional[IngressPipeline] = None
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Initialize services on startup."""
+    global vector_store, rag_system, ingress_pipeline
+
+    print("Initializing P2P Wiki AI system...")
+
+    # Check if vector store has been populated
+    chroma_path = settings.chroma_persist_dir
+    if not chroma_path.exists() or not any(chroma_path.iterdir()):
+        print("WARNING: Vector store not initialized. Run 'python -m src.parser' and 'python -m src.embeddings' first.")
+    else:
+        vector_store = WikiVectorStore()
+        rag_system = WikiRAG(vector_store)
+        ingress_pipeline = IngressPipeline(vector_store)
+        print(f"Loaded vector store with {vector_store.get_stats()['total_chunks']} chunks")
+
+    yield
+
+    print("Shutting down...")
+
+
+app = FastAPI(
+    title="P2P Wiki AI",
+    description="AI-augmented system for P2P Foundation Wiki - chat agent and ingress pipeline",
+    version="0.1.0",
+    lifespan=lifespan,
+)
+
+# CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # Configure appropriately for production
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+# --- Request/Response Models ---
+
+
+class ChatRequest(BaseModel):
+    """Chat request model."""
+
+    query: str
+    n_results: int = 5
+    filter_categories: Optional[list[str]] = None
+
+
+class ChatResponse(BaseModel):
+    """Chat response model."""
+
+    answer: str
+    sources: list[dict]
+    query: str
+
+
+class IngressRequest(BaseModel):
+    """Ingress request model."""
+
+    url: HttpUrl
+
+
+class IngressResponse(BaseModel):
+    """Ingress response model."""
+
+    status: str
+    message: str
+    scraped_title: Optional[str] = None
+    topics_found: int = 0
+    wiki_matches: int = 0
+    drafts_generated: int = 0
+    queue_file: Optional[str] = None
+
+
+class ReviewActionRequest(BaseModel):
+    """Review action request model."""
+
+    filepath: str
+    item_type: str  # "match" or "draft"
+    item_index: int
+    action: str  # "approve" or "reject"
+
+
+# --- API Endpoints ---
+
+
+@app.get("/")
+async def root():
+    """Root endpoint."""
+    return {
+        "name": "P2P Wiki AI",
+        "version": "0.1.0",
+        "status": "running",
+        "vector_store_ready": vector_store is not None,
+    }
+
+
+@app.get("/health")
+async def health():
+    """Health check endpoint."""
+    return {
+        "status": "healthy",
+        "vector_store_ready": vector_store is not None,
+    }
+
+
+@app.get("/stats")
+async def stats():
+    """Get system statistics."""
+    if not vector_store:
+        return {"error": "Vector store not initialized"}
+
+    return {
+        "vector_store": vector_store.get_stats(),
+        "review_queue_count": len(get_review_queue()),
+    }
+
+
+# --- Chat Endpoints ---
+
+
+@app.post("/chat", response_model=ChatResponse)
+async def chat(request: ChatRequest):
+    """Chat with the wiki knowledge base."""
+    if not rag_system:
+        raise HTTPException(
+            status_code=503,
+            detail="RAG system not initialized. Run indexing first.",
+        )
+
+    response = await rag_system.ask(
+        query=request.query,
+        n_results=request.n_results,
+        filter_categories=request.filter_categories,
+    )
+
+    return ChatResponse(
+        answer=response.answer,
+        sources=response.sources,
+        query=response.query,
+    )
+
+
+@app.post("/chat/clear")
+async def clear_chat():
+    """Clear chat history."""
+    if rag_system:
+        rag_system.clear_history()
+    return {"status": "cleared"}
+
+
+@app.get("/chat/suggestions")
+async def chat_suggestions(q: str = ""):
+    """Get article title suggestions for autocomplete."""
+    if not rag_system or not q:
+        return {"suggestions": []}
+
+    suggestions = rag_system.get_suggestions(q)
+    return {"suggestions": suggestions}
+
+
+# --- Ingress Endpoints ---
+
+
+@app.post("/ingress", response_model=IngressResponse)
+async def ingress(request: IngressRequest, background_tasks: BackgroundTasks):
+    """
+    Process an external article URL through the ingress pipeline.
+
+    This scrapes the article, analyzes it for wiki relevance,
+    finds matching existing articles, and generates draft articles.
+    """
+    if not ingress_pipeline:
+        raise HTTPException(
+            status_code=503,
+            detail="Ingress pipeline not initialized. Run indexing first.",
+        )
+
+    try:
+        result = await ingress_pipeline.process(str(request.url))
+
+        return IngressResponse(
+            status="success",
+            message="Article processed successfully",
+            scraped_title=result.scraped.title,
+            topics_found=len(result.analysis.get("main_topics", [])),
+            wiki_matches=len(result.wiki_matches),
+            drafts_generated=len(result.draft_articles),
+            queue_file=result.timestamp,
+        )
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+# --- Review Queue Endpoints ---
+
+
+@app.get("/review")
+async def get_review_items():
+    """Get all items in the review queue."""
+    items = get_review_queue()
+    return {"count": len(items), "items": items}
+
+
+@app.get("/review/{filename}")
+async def get_review_item(filename: str):
+    """Get a specific review item."""
+    filepath = settings.review_queue_dir / filename
+    if not filepath.exists():
+        raise HTTPException(status_code=404, detail="Review item not found")
+
+    import json
+
+    with open(filepath, "r", encoding="utf-8") as f:
+        data = json.load(f)
+
+    return data
+
+
+@app.post("/review/action")
+async def review_action(request: ReviewActionRequest):
+    """Approve or reject a review item."""
+    if request.action == "approve":
+        success = approve_item(request.filepath, request.item_type, request.item_index)
+    elif request.action == "reject":
+        success = reject_item(request.filepath, request.item_type, request.item_index)
+    else:
+        raise HTTPException(status_code=400, detail="Invalid action")
+
+    if success:
+        return {"status": "success", "action": request.action}
+    else:
+        raise HTTPException(status_code=500, detail="Action failed")
+
+
+# --- Search Endpoints ---
+
+
+@app.get("/search")
+async def search(q: str, n: int = 10, categories: Optional[str] = None):
+    """Direct search of the vector store."""
+    if not vector_store:
+        raise HTTPException(status_code=503, detail="Vector store not initialized")
+
+    filter_cats = categories.split(",") if categories else None
+    results = vector_store.search(q, n_results=n, filter_categories=filter_cats)
+
+    return {"query": q, "count": len(results), "results": results}
+
+
+@app.get("/articles")
+async def list_articles(limit: int = 100, offset: int = 0):
+    """List article titles."""
+    if not vector_store:
+        raise HTTPException(status_code=503, detail="Vector store not initialized")
+
+    titles = vector_store.get_article_titles()
+    return {
+        "total": len(titles),
+        "limit": limit,
+        "offset": offset,
+        "titles": titles[offset : offset + limit],
+    }
+
+
+# --- Static Files (Web UI) ---
+
+web_dir = Path(__file__).parent.parent / "web"
+if web_dir.exists():
+    app.mount("/static", StaticFiles(directory=str(web_dir)), name="static")
+
+    @app.get("/ui")
+    async def ui():
+        """Serve the web UI."""
+        index_path = web_dir / "index.html"
+        if index_path.exists():
+            return FileResponse(index_path)
+        raise HTTPException(status_code=404, detail="Web UI not found")
+
+
+def main():
+    """Run the API server."""
+    import uvicorn
+
+    uvicorn.run(
+        "src.api:app",
+        host=settings.host,
+        port=settings.port,
+        reload=True,
+    )
+
+
+if __name__ == "__main__":
+    main()
--- a/src/config.py
+++ b/src/config.py
@ -0,0 +1,51 @@
+"""Configuration settings for P2P Wiki AI system."""
+
+from pathlib import Path
+from pydantic_settings import BaseSettings
+
+
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables."""
+
+    # Paths
+    project_root: Path = Path(__file__).parent.parent
+    data_dir: Path = project_root / "data"
+    xmldump_dir: Path = project_root / "xmldump"
+
+    # Vector store
+    chroma_persist_dir: Path = data_dir / "chroma"
+    embedding_model: str = "all-MiniLM-L6-v2"  # Fast, good quality
+
+    # Ollama (local LLM)
+    ollama_base_url: str = "http://localhost:11434"
+    ollama_model: str = "llama3.2"  # Default model for local inference
+
+    # Claude API (for complex tasks)
+    anthropic_api_key: str = ""
+    claude_model: str = "claude-sonnet-4-20250514"
+
+    # Hybrid routing thresholds
+    use_claude_for_drafts: bool = True  # Use Claude for article drafting
+    use_ollama_for_chat: bool = True  # Use Ollama for simple Q&A
+
+    # MediaWiki
+    mediawiki_api_url: str = ""  # Set if you have a live wiki API
+
+    # Server
+    host: str = "0.0.0.0"
+    port: int = 8420
+
+    # Review queue
+    review_queue_dir: Path = data_dir / "review_queue"
+
+    class Config:
+        env_file = ".env"
+        env_file_encoding = "utf-8"
+
+
+settings = Settings()
+
+# Ensure directories exist
+settings.data_dir.mkdir(parents=True, exist_ok=True)
+settings.chroma_persist_dir.mkdir(parents=True, exist_ok=True)
+settings.review_queue_dir.mkdir(parents=True, exist_ok=True)
--- a/src/embeddings.py
+++ b/src/embeddings.py
@ -0,0 +1,256 @@
+"""Vector store setup and embedding generation using ChromaDB."""
+
+import json
+from pathlib import Path
+from typing import Optional
+
+import chromadb
+from chromadb.config import Settings as ChromaSettings
+from rich.console import Console
+from rich.progress import Progress
+from sentence_transformers import SentenceTransformer
+
+from .config import settings
+from .parser import WikiArticle
+
+console = Console()
+
+# Chunk size for embedding (in characters)
+CHUNK_SIZE = 1000
+CHUNK_OVERLAP = 200
+
+
+class WikiVectorStore:
+    """Vector store for wiki articles using ChromaDB."""
+
+    def __init__(self, persist_dir: Optional[Path] = None):
+        self.persist_dir = persist_dir or settings.chroma_persist_dir
+
+        # Initialize ChromaDB
+        self.client = chromadb.PersistentClient(
+            path=str(self.persist_dir),
+            settings=ChromaSettings(anonymized_telemetry=False),
+        )
+
+        # Create or get collection
+        self.collection = self.client.get_or_create_collection(
+            name="wiki_articles",
+            metadata={"hnsw:space": "cosine"},
+        )
+
+        # Load embedding model
+        console.print(f"[cyan]Loading embedding model: {settings.embedding_model}[/cyan]")
+        self.model = SentenceTransformer(settings.embedding_model)
+        console.print("[green]Model loaded[/green]")
+
+    def _chunk_text(self, text: str, title: str) -> list[tuple[str, dict]]:
+        """Split text into overlapping chunks with metadata."""
+        if len(text) <= CHUNK_SIZE:
+            return [(text, {"chunk_index": 0, "total_chunks": 1})]
+
+        chunks = []
+        start = 0
+        chunk_index = 0
+
+        while start < len(text):
+            end = start + CHUNK_SIZE
+
+            # Try to break at sentence boundary
+            if end < len(text):
+                # Look for sentence end within last 100 chars
+                for i in range(min(100, end - start)):
+                    if text[end - i] in ".!?\n":
+                        end = end - i + 1
+                        break
+
+            chunk_text = text[start:end].strip()
+            if chunk_text:
+                # Prepend title for context
+                chunk_with_title = f"{title}\n\n{chunk_text}"
+                chunks.append(
+                    (chunk_with_title, {"chunk_index": chunk_index, "total_chunks": -1})
+                )
+                chunk_index += 1
+
+            start = end - CHUNK_OVERLAP
+
+        # Update total_chunks
+        for i, (text, meta) in enumerate(chunks):
+            meta["total_chunks"] = len(chunks)
+
+        return chunks
+
+    def get_embedded_article_ids(self) -> set:
+        """Get set of article IDs that are already embedded."""
+        results = self.collection.get(include=["metadatas"])
+        article_ids = set()
+        for meta in results["metadatas"]:
+            if meta and "article_id" in meta:
+                article_ids.add(meta["article_id"])
+        return article_ids
+
+    def add_articles(self, articles: list[WikiArticle], batch_size: int = 100, resume: bool = True):
+        """Add articles to the vector store."""
+        console.print(f"[cyan]Processing {len(articles)} articles...[/cyan]")
+
+        # Check for already embedded articles if resuming
+        if resume:
+            embedded_ids = self.get_embedded_article_ids()
+            original_count = len(articles)
+            articles = [a for a in articles if a.id not in embedded_ids]
+            skipped = original_count - len(articles)
+            if skipped > 0:
+                console.print(f"[yellow]Skipping {skipped} already-embedded articles[/yellow]")
+            if not articles:
+                console.print("[green]All articles already embedded![/green]")
+                return
+
+        all_chunks = []
+        all_ids = []
+        all_metadatas = []
+
+        with Progress() as progress:
+            task = progress.add_task("[cyan]Chunking articles...", total=len(articles))
+
+            for article in articles:
+                if not article.plain_text:
+                    progress.advance(task)
+                    continue
+
+                chunks = self._chunk_text(article.plain_text, article.title)
+
+                for chunk_text, chunk_meta in chunks:
+                    chunk_id = f"{article.id}_{chunk_meta['chunk_index']}"
+
+                    metadata = {
+                        "article_id": article.id,
+                        "title": article.title,
+                        "categories": ",".join(article.categories[:10]),  # Limit categories
+                        "timestamp": article.timestamp,
+                        "chunk_index": chunk_meta["chunk_index"],
+                        "total_chunks": chunk_meta["total_chunks"],
+                    }
+
+                    all_chunks.append(chunk_text)
+                    all_ids.append(chunk_id)
+                    all_metadatas.append(metadata)
+
+                progress.advance(task)
+
+        console.print(f"[cyan]Created {len(all_chunks)} chunks from {len(articles)} articles[/cyan]")
+
+        # Generate embeddings and add in batches
+        console.print("[cyan]Generating embeddings and adding to vector store...[/cyan]")
+
+        with Progress() as progress:
+            task = progress.add_task(
+                "[cyan]Embedding and storing...", total=len(all_chunks) // batch_size + 1
+            )
+
+            for i in range(0, len(all_chunks), batch_size):
+                batch_chunks = all_chunks[i : i + batch_size]
+                batch_ids = all_ids[i : i + batch_size]
+                batch_metadatas = all_metadatas[i : i + batch_size]
+
+                # Generate embeddings
+                embeddings = self.model.encode(batch_chunks, show_progress_bar=False)
+
+                # Add to collection
+                self.collection.add(
+                    ids=batch_ids,
+                    embeddings=embeddings.tolist(),
+                    documents=batch_chunks,
+                    metadatas=batch_metadatas,
+                )
+
+                progress.advance(task)
+
+        console.print(f"[green]Added {len(all_chunks)} chunks to vector store[/green]")
+
+    def search(
+        self,
+        query: str,
+        n_results: int = 5,
+        filter_categories: Optional[list[str]] = None,
+    ) -> list[dict]:
+        """Search for relevant chunks."""
+        query_embedding = self.model.encode([query])[0]
+
+        where_filter = None
+        if filter_categories:
+            # ChromaDB where filter for categories
+            where_filter = {
+                "$or": [{"categories": {"$contains": cat}} for cat in filter_categories]
+            }
+
+        results = self.collection.query(
+            query_embeddings=[query_embedding.tolist()],
+            n_results=n_results,
+            where=where_filter,
+            include=["documents", "metadatas", "distances"],
+        )
+
+        # Format results
+        formatted = []
+        if results["documents"] and results["documents"][0]:
+            for i, doc in enumerate(results["documents"][0]):
+                formatted.append(
+                    {
+                        "content": doc,
+                        "metadata": results["metadatas"][0][i],
+                        "distance": results["distances"][0][i],
+                    }
+                )
+
+        return formatted
+
+    def get_article_titles(self) -> list[str]:
+        """Get all unique article titles in the store."""
+        # Get all metadata
+        results = self.collection.get(include=["metadatas"])
+        titles = set()
+        for meta in results["metadatas"]:
+            if meta and "title" in meta:
+                titles.add(meta["title"])
+        return sorted(titles)
+
+    def get_stats(self) -> dict:
+        """Get statistics about the vector store."""
+        count = self.collection.count()
+
+        # Get sample of metadatas to count unique articles
+        sample = self.collection.get(limit=10000, include=["metadatas"])
+        unique_articles = len(set(m["article_id"] for m in sample["metadatas"] if m))
+
+        return {
+            "total_chunks": count,
+            "unique_articles_sampled": unique_articles,
+            "persist_dir": str(self.persist_dir),
+        }
+
+
+def main():
+    """CLI entry point for generating embeddings."""
+    articles_path = settings.data_dir / "articles.json"
+
+    if not articles_path.exists():
+        console.print(f"[red]Articles file not found: {articles_path}[/red]")
+        console.print("[yellow]Run 'python -m src.parser' first to parse XML dumps[/yellow]")
+        return
+
+    console.print(f"[cyan]Loading articles from {articles_path}...[/cyan]")
+    with open(articles_path, "r", encoding="utf-8") as f:
+        articles_data = json.load(f)
+
+    articles = [WikiArticle(**a) for a in articles_data]
+    console.print(f"[green]Loaded {len(articles)} articles[/green]")
+
+    store = WikiVectorStore()
+    store.add_articles(articles)
+
+    stats = store.get_stats()
+    console.print(f"[green]Vector store stats: {stats}[/green]")
+
+
+if __name__ == "__main__":
+    main()
--- a/src/ingress.py
+++ b/src/ingress.py
@ -0,0 +1,467 @@
+"""Article ingress pipeline - scrape, analyze, and draft wiki content."""
+
+import json
+import re
+from dataclasses import dataclass, field, asdict
+from datetime import datetime
+from pathlib import Path
+from typing import Optional
+from urllib.parse import urlparse
+
+import httpx
+import trafilatura
+from bs4 import BeautifulSoup
+from rich.console import Console
+
+from .config import settings
+from .embeddings import WikiVectorStore
+from .llm import llm_client
+
+console = Console()
+
+
+@dataclass
+class ScrapedArticle:
+    """Represents a scraped external article."""
+
+    url: str
+    title: str
+    content: str
+    author: Optional[str] = None
+    date: Optional[str] = None
+    domain: str = ""
+    word_count: int = 0
+
+    def __post_init__(self):
+        if not self.domain:
+            self.domain = urlparse(self.url).netloc
+        if not self.word_count:
+            self.word_count = len(self.content.split())
+
+
+@dataclass
+class WikiMatch:
+    """A matching wiki article for citation."""
+
+    title: str
+    article_id: int
+    relevance_score: float
+    categories: list[str]
+    suggested_citation: str  # How to cite the scraped article in this wiki page
+
+
+@dataclass
+class DraftArticle:
+    """A draft wiki article generated from scraped content."""
+
+    title: str
+    content: str  # MediaWiki formatted content
+    categories: list[str]
+    source_url: str
+    source_title: str
+    summary: str
+    related_articles: list[str]  # Existing wiki articles to link to
+
+
+@dataclass
+class IngressResult:
+    """Result of the ingress pipeline."""
+
+    scraped: ScrapedArticle
+    analysis: dict  # Topic analysis results
+    wiki_matches: list[WikiMatch]  # Existing articles to update with citations
+    draft_articles: list[DraftArticle]  # New articles to create
+    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
+
+    def to_dict(self) -> dict:
+        return {
+            "scraped": asdict(self.scraped),
+            "analysis": self.analysis,
+            "wiki_matches": [asdict(m) for m in self.wiki_matches],
+            "draft_articles": [asdict(d) for d in self.draft_articles],
+            "timestamp": self.timestamp,
+        }
+
+
+class ArticleScraper:
+    """Scrapes and extracts content from URLs."""
+
+    async def scrape(self, url: str) -> ScrapedArticle:
+        """Scrape article content from URL."""
+        console.print(f"[cyan]Scraping: {url}[/cyan]")
+
+        async with httpx.AsyncClient(
+            timeout=30.0,
+            follow_redirects=True,
+            headers={
+                "User-Agent": "Mozilla/5.0 (compatible; P2PWikiBot/1.0; +http://p2pfoundation.net)"
+            },
+        ) as client:
+            response = await client.get(url)
+            response.raise_for_status()
+            html = response.text
+
+        # Use trafilatura for main content extraction
+        content = trafilatura.extract(
+            html,
+            include_comments=False,
+            include_tables=True,
+            no_fallback=False,
+        )
+
+        if not content:
+            # Fallback to BeautifulSoup
+            soup = BeautifulSoup(html, "html.parser")
+            # Remove script and style elements
+            for element in soup(["script", "style", "nav", "footer", "header"]):
+                element.decompose()
+            content = soup.get_text(separator="\n", strip=True)
+
+        # Extract metadata
+        soup = BeautifulSoup(html, "html.parser")
+
+        title = ""
+        title_tag = soup.find("title")
+        if title_tag:
+            title = title_tag.get_text(strip=True)
+        # Try og:title
+        og_title = soup.find("meta", property="og:title")
+        if og_title and og_title.get("content"):
+            title = og_title["content"]
+
+        author = None
+        author_meta = soup.find("meta", attrs={"name": "author"})
+        if author_meta and author_meta.get("content"):
+            author = author_meta["content"]
+
+        date = None
+        date_meta = soup.find("meta", attrs={"name": "date"}) or soup.find(
+            "meta", property="article:published_time"
+        )
+        if date_meta and date_meta.get("content"):
+            date = date_meta["content"]
+
+        return ScrapedArticle(
+            url=url,
+            title=title,
+            content=content or "",
+            author=author,
+            date=date,
+        )
+
+
+class ContentAnalyzer:
+    """Analyzes scraped content for wiki relevance."""
+
+    def __init__(self, vector_store: Optional[WikiVectorStore] = None):
+        self.vector_store = vector_store or WikiVectorStore()
+
+    async def analyze(self, article: ScrapedArticle) -> dict:
+        """Analyze article for topics, concepts, and wiki relevance."""
+        # Truncate very long articles for analysis
+        content_for_analysis = article.content[:8000]
+
+        analysis_prompt = f"""Analyze this article for potential wiki content about peer-to-peer culture, commons, alternative economics, and collaborative governance.
+
+Article Title: {article.title}
+Source: {article.domain}
+
+Article Content:
+{content_for_analysis}
+
+Please provide your analysis in the following JSON format:
+{{
+    "main_topics": ["topic1", "topic2"],
+    "key_concepts": ["concept1", "concept2"],
+    "relevant_categories": ["category1", "category2"],
+    "summary": "2-3 sentence summary",
+    "wiki_relevance_score": 0.0-1.0,
+    "suggested_article_titles": ["Title 1", "Title 2"],
+    "key_quotes": ["notable quote 1", "notable quote 2"],
+    "mentioned_organizations": ["org1", "org2"],
+    "mentioned_people": ["person1", "person2"]
+}}
+
+Focus on topics relevant to:
+- Peer-to-peer networks and culture
+- Commons-based peer production
+- Alternative economics and post-capitalism
+- Cooperative business models
+- Open source / free culture
+- Collaborative governance
+- Sustainability and ecology"""
+
+        response = await llm_client.analyze(
+            content=article.content[:8000],
+            task=analysis_prompt,
+            temperature=0.3,
+        )
+
+        # Parse JSON from response
+        try:
+            # Find JSON in response
+            json_match = re.search(r"\{[\s\S]*\}", response)
+            if json_match:
+                analysis = json.loads(json_match.group())
+            else:
+                analysis = {"error": "Could not parse analysis", "raw": response}
+        except json.JSONDecodeError:
+            analysis = {"error": "Invalid JSON in analysis", "raw": response}
+
+        return analysis
+
+    async def find_wiki_matches(
+        self, article: ScrapedArticle, analysis: dict, n_results: int = 10
+    ) -> list[WikiMatch]:
+        """Find existing wiki articles that could cite this content."""
+        matches = []
+
+        # Search using main topics and concepts
+        search_terms = analysis.get("main_topics", []) + analysis.get("key_concepts", [])
+
+        for term in search_terms[:5]:  # Limit searches
+            results = self.vector_store.search(term, n_results=3)
+
+            for result in results:
+                title = result["metadata"].get("title", "Unknown")
+                article_id = result["metadata"].get("article_id", 0)
+                distance = result.get("distance", 1.0)
+
+                # Skip if already added
+                if any(m.title == title for m in matches):
+                    continue
+
+                # Calculate relevance (lower distance = higher relevance)
+                relevance = max(0, 1 - distance)
+
+                if relevance > 0.3:  # Threshold for relevance
+                    matches.append(
+                        WikiMatch(
+                            title=title,
+                            article_id=article_id,
+                            relevance_score=relevance,
+                            categories=result["metadata"]
+                            .get("categories", "")
+                            .split(","),
+                            suggested_citation=f"See also: [{article.title}]({article.url})",
+                        )
+                    )
+
+        # Sort by relevance and limit
+        matches.sort(key=lambda m: m.relevance_score, reverse=True)
+        return matches[:n_results]
+
+
+class DraftGenerator:
+    """Generates draft wiki articles from scraped content."""
+
+    def __init__(self, vector_store: Optional[WikiVectorStore] = None):
+        self.vector_store = vector_store or WikiVectorStore()
+
+    async def generate_drafts(
+        self,
+        article: ScrapedArticle,
+        analysis: dict,
+        max_drafts: int = 3,
+    ) -> list[DraftArticle]:
+        """Generate draft wiki articles based on scraped content."""
+        drafts = []
+
+        suggested_titles = analysis.get("suggested_article_titles", [])
+        if not suggested_titles:
+            return drafts
+
+        for title in suggested_titles[:max_drafts]:
+            # Check if article already exists
+            existing = self.vector_store.search(title, n_results=1)
+            if existing and existing[0].get("distance", 1.0) < 0.1:
+                console.print(f"[yellow]Skipping '{title}' - similar article exists[/yellow]")
+                continue
+
+            draft = await self._generate_single_draft(article, analysis, title)
+            if draft:
+                drafts.append(draft)
+
+        return drafts
+
+    async def _generate_single_draft(
+        self,
+        article: ScrapedArticle,
+        analysis: dict,
+        title: str,
+    ) -> Optional[DraftArticle]:
+        """Generate a single draft article."""
+        # Find related existing articles
+        related_search = self.vector_store.search(title, n_results=5)
+        related_titles = [
+            r["metadata"].get("title", "")
+            for r in related_search
+            if r.get("distance", 1.0) < 0.5
+        ]
+
+        categories = analysis.get("relevant_categories", [])
+        summary = analysis.get("summary", "")
+
+        draft_prompt = f"""Create a MediaWiki-formatted article for the P2P Foundation Wiki.
+
+Article Title: {title}
+
+Source Material:
+Title: {article.title}
+URL: {article.url}
+Summary: {summary}
+
+Key concepts to cover: {', '.join(analysis.get('key_concepts', []))}
+
+Related existing wiki articles: {', '.join(related_titles)}
+
+Categories to include: {', '.join(categories)}
+
+Please write the wiki article in MediaWiki markup format with:
+1. An introduction/definition section
+2. A "Description" section with key information
+3. Links to related wiki articles using [[Article Name]] format
+4. A "Sources" section citing the original article
+5. Category tags at the end using [[Category:Name]] format
+
+The article should:
+- Be encyclopedic and neutral in tone
+- Focus on the P2P/commons aspects of the topic
+- Be approximately 300-500 words
+- Include internal wiki links to related concepts"""
+
+        content = await llm_client.generate_draft(
+            draft_prompt,
+            system="You are a wiki editor for the P2P Foundation Wiki. Write clear, encyclopedic articles in MediaWiki markup format.",
+            temperature=0.5,
+        )
+
+        return DraftArticle(
+            title=title,
+            content=content,
+            categories=categories,
+            source_url=article.url,
+            source_title=article.title,
+            summary=summary,
+            related_articles=related_titles,
+        )
+
+
+class IngressPipeline:
+    """Complete ingress pipeline for processing external articles."""
+
+    def __init__(self, vector_store: Optional[WikiVectorStore] = None):
+        self.vector_store = vector_store or WikiVectorStore()
+        self.scraper = ArticleScraper()
+        self.analyzer = ContentAnalyzer(self.vector_store)
+        self.generator = DraftGenerator(self.vector_store)
+
+    async def process(self, url: str) -> IngressResult:
+        """Process a URL through the complete ingress pipeline."""
+        console.print(f"[bold cyan]Processing: {url}[/bold cyan]")
+
+        # Step 1: Scrape
+        console.print("[cyan]Step 1/4: Scraping article...[/cyan]")
+        scraped = await self.scraper.scrape(url)
+        console.print(f"[green]Scraped: {scraped.title} ({scraped.word_count} words)[/green]")
+
+        # Step 2: Analyze
+        console.print("[cyan]Step 2/4: Analyzing content...[/cyan]")
+        analysis = await self.analyzer.analyze(scraped)
+        console.print(f"[green]Found {len(analysis.get('main_topics', []))} main topics[/green]")
+
+        # Step 3: Find wiki matches
+        console.print("[cyan]Step 3/4: Finding wiki matches...[/cyan]")
+        matches = await self.analyzer.find_wiki_matches(scraped, analysis)
+        console.print(f"[green]Found {len(matches)} potential wiki matches[/green]")
+
+        # Step 4: Generate drafts
+        console.print("[cyan]Step 4/4: Generating draft articles...[/cyan]")
+        drafts = await self.generator.generate_drafts(scraped, analysis)
+        console.print(f"[green]Generated {len(drafts)} draft articles[/green]")
+
+        result = IngressResult(
+            scraped=scraped,
+            analysis=analysis,
+            wiki_matches=matches,
+            draft_articles=drafts,
+        )
+
+        # Save to review queue
+        self._save_to_review_queue(result)
+
+        return result
+
+    def _save_to_review_queue(self, result: IngressResult):
+        """Save ingress result to the review queue."""
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        domain = result.scraped.domain.replace(".", "_")
+        filename = f"{timestamp}_{domain}.json"
+        filepath = settings.review_queue_dir / filename
+
+        with open(filepath, "w", encoding="utf-8") as f:
+            json.dump(result.to_dict(), f, indent=2, ensure_ascii=False)
+
+        console.print(f"[green]Saved to review queue: {filepath}[/green]")
+
+
+def get_review_queue() -> list[dict]:
+    """Get all items in the review queue."""
+    queue_files = sorted(settings.review_queue_dir.glob("*.json"), reverse=True)
+
+    items = []
+    for filepath in queue_files:
+        with open(filepath, "r", encoding="utf-8") as f:
+            data = json.load(f)
+            data["_filepath"] = str(filepath)
+            items.append(data)
+
+    return items
+
+
+def approve_item(filepath: str, item_type: str, item_index: int) -> bool:
+    """
+    Approve an item from the review queue.
+
+    Args:
+        filepath: Path to the review queue JSON file
+        item_type: "match" or "draft"
+        item_index: Index of the item to approve
+
+    Returns:
+        True if successful
+    """
+    # For now, just mark as approved in the file
+    # In production, this would push to MediaWiki API
+    with open(filepath, "r", encoding="utf-8") as f:
+        data = json.load(f)
+
+    if item_type == "match":
+        if item_index < len(data.get("wiki_matches", [])):
+            data["wiki_matches"][item_index]["approved"] = True
+    elif item_type == "draft":
+        if item_index < len(data.get("draft_articles", [])):
+            data["draft_articles"][item_index]["approved"] = True
+
+    with open(filepath, "w", encoding="utf-8") as f:
+        json.dump(data, f, indent=2, ensure_ascii=False)
+
+    return True
+
+
+def reject_item(filepath: str, item_type: str, item_index: int) -> bool:
+    """Reject an item from the review queue."""
+    with open(filepath, "r", encoding="utf-8") as f:
+        data = json.load(f)
+
+    if item_type == "match":
+        if item_index < len(data.get("wiki_matches", [])):
+            data["wiki_matches"][item_index]["rejected"] = True
+    elif item_type == "draft":
+        if item_index < len(data.get("draft_articles", [])):
+            data["draft_articles"][item_index]["rejected"] = True
+
+    with open(filepath, "w", encoding="utf-8") as f:
+        json.dump(data, f, indent=2, ensure_ascii=False)
+
+    return True
--- a/src/llm.py
+++ b/src/llm.py
@ -0,0 +1,153 @@
+"""LLM client with hybrid routing between Ollama and Claude."""
+
+from typing import AsyncIterator, Optional
+import httpx
+from anthropic import Anthropic
+from tenacity import retry, stop_after_attempt, wait_exponential
+
+from .config import settings
+
+
+class LLMClient:
+    """Unified LLM client with hybrid routing."""
+
+    def __init__(self):
+        self.ollama_url = settings.ollama_base_url
+        self.ollama_model = settings.ollama_model
+
+        # Initialize Claude client if API key is set
+        self.claude_client = None
+        if settings.anthropic_api_key:
+            self.claude_client = Anthropic(api_key=settings.anthropic_api_key)
+
+    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))
+    async def _call_ollama(
+        self,
+        prompt: str,
+        system: Optional[str] = None,
+        temperature: float = 0.7,
+        max_tokens: int = 2048,
+    ) -> str:
+        """Call Ollama API."""
+        messages = []
+        if system:
+            messages.append({"role": "system", "content": system})
+        messages.append({"role": "user", "content": prompt})
+
+        async with httpx.AsyncClient(timeout=120.0) as client:
+            response = await client.post(
+                f"{self.ollama_url}/api/chat",
+                json={
+                    "model": self.ollama_model,
+                    "messages": messages,
+                    "stream": False,
+                    "options": {
+                        "temperature": temperature,
+                        "num_predict": max_tokens,
+                    },
+                },
+            )
+            response.raise_for_status()
+            data = response.json()
+            return data["message"]["content"]
+
+    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))
+    async def _call_claude(
+        self,
+        prompt: str,
+        system: Optional[str] = None,
+        temperature: float = 0.7,
+        max_tokens: int = 4096,
+    ) -> str:
+        """Call Claude API."""
+        if not self.claude_client:
+            raise ValueError("Claude API key not configured")
+
+        message = self.claude_client.messages.create(
+            model=settings.claude_model,
+            max_tokens=max_tokens,
+            system=system or "",
+            messages=[{"role": "user", "content": prompt}],
+            temperature=temperature,
+        )
+        return message.content[0].text
+
+    async def chat(
+        self,
+        prompt: str,
+        system: Optional[str] = None,
+        use_claude: bool = False,
+        temperature: float = 0.7,
+        max_tokens: int = 2048,
+    ) -> str:
+        """
+        Chat with LLM using hybrid routing.
+
+        Args:
+            prompt: User prompt
+            system: System prompt
+            use_claude: Force Claude API (otherwise uses Ollama by default)
+            temperature: Sampling temperature
+            max_tokens: Max response tokens
+
+        Returns:
+            LLM response text
+        """
+        if use_claude and self.claude_client:
+            return await self._call_claude(prompt, system, temperature, max_tokens)
+        else:
+            return await self._call_ollama(prompt, system, temperature, max_tokens)
+
+    async def generate_draft(
+        self,
+        prompt: str,
+        system: Optional[str] = None,
+        temperature: float = 0.5,
+    ) -> str:
+        """
+        Generate article draft - uses Claude for higher quality.
+
+        Args:
+            prompt: Prompt describing what to generate
+            system: System prompt for context
+            temperature: Lower for more factual output
+
+        Returns:
+            Generated draft text
+        """
+        # Use Claude for drafts if configured, otherwise fall back to Ollama
+        use_claude = settings.use_claude_for_drafts and self.claude_client is not None
+        return await self.chat(
+            prompt, system, use_claude=use_claude, temperature=temperature, max_tokens=4096
+        )
+
+    async def analyze(
+        self,
+        content: str,
+        task: str,
+        temperature: float = 0.3,
+    ) -> str:
+        """
+        Analyze content for a specific task - uses Claude for complex analysis.
+
+        Args:
+            content: Content to analyze
+            task: Description of analysis task
+            temperature: Lower for more deterministic output
+
+        Returns:
+            Analysis result
+        """
+        prompt = f"""Task: {task}
+
+Content to analyze:
+{content}
+
+Provide your analysis:"""
+
+        use_claude = self.claude_client is not None
+        return await self.chat(prompt, use_claude=use_claude, temperature=temperature)
+
+
+# Singleton instance
+llm_client = LLMClient()
--- a/src/parser.py
+++ b/src/parser.py
@ -0,0 +1,267 @@
+"""MediaWiki XML dump parser - converts to structured JSON."""
+
+import json
+import re
+from dataclasses import dataclass, field, asdict
+from pathlib import Path
+from typing import Iterator
+from lxml import etree
+from rich.progress import Progress, TaskID
+from rich.console import Console
+
+from .config import settings
+
+console = Console()
+
+# MediaWiki namespace
+MW_NS = {"mw": "http://www.mediawiki.org/xml/export-0.6/"}
+
+
+@dataclass
+class WikiArticle:
+    """Represents a parsed wiki article."""
+
+    id: int
+    title: str
+    content: str  # Raw wikitext
+    plain_text: str  # Cleaned plain text for embedding
+    categories: list[str] = field(default_factory=list)
+    links: list[str] = field(default_factory=list)  # Internal wiki links
+    external_links: list[str] = field(default_factory=list)
+    timestamp: str = ""
+    contributor: str = ""
+
+    def to_dict(self) -> dict:
+        return asdict(self)
+
+
+def clean_wikitext(text: str) -> str:
+    """Convert MediaWiki markup to plain text for embedding."""
+    if not text:
+        return ""
+
+    # Remove templates {{...}}
+    text = re.sub(r"\{\{[^}]+\}\}", "", text)
+
+    # Remove categories [[Category:...]]
+    text = re.sub(r"\[\[Category:[^\]]+\]\]", "", text, flags=re.IGNORECASE)
+
+    # Convert wiki links [[Page|Display]] or [[Page]] to just the display text
+    text = re.sub(r"\[\[([^|\]]+)\|([^\]]+)\]\]", r"\2", text)
+    text = re.sub(r"\[\[([^\]]+)\]\]", r"\1", text)
+
+    # Remove external links [url text] -> text
+    text = re.sub(r"\[https?://[^\s\]]+ ([^\]]+)\]", r"\1", text)
+    text = re.sub(r"\[https?://[^\]]+\]", "", text)
+
+    # Remove wiki formatting
+    text = re.sub(r"'''?([^']+)'''?", r"\1", text)  # Bold/italic
+    text = re.sub(r"={2,}([^=]+)={2,}", r"\1", text)  # Headers
+    text = re.sub(r"^[*#:;]+", "", text, flags=re.MULTILINE)  # List markers
+
+    # Remove HTML tags
+    text = re.sub(r"<[^>]+>", "", text)
+
+    # Clean up whitespace
+    text = re.sub(r"\n{3,}", "\n\n", text)
+    text = re.sub(r" {2,}", " ", text)
+
+    return text.strip()
+
+
+def extract_categories(text: str) -> list[str]:
+    """Extract category names from wikitext."""
+    pattern = r"\[\[Category:([^\]|]+)"
+    return list(set(re.findall(pattern, text, re.IGNORECASE)))
+
+
+def extract_wiki_links(text: str) -> list[str]:
+    """Extract internal wiki links from wikitext."""
+    # Match [[Page]] or [[Page|Display]]
+    pattern = r"\[\[([^|\]]+)"
+    links = re.findall(pattern, text)
+    # Filter out categories and files
+    return list(
+        set(
+            link.strip()
+            for link in links
+            if not link.lower().startswith(("category:", "file:", "image:"))
+        )
+    )
+
+
+def extract_external_links(text: str) -> list[str]:
+    """Extract external URLs from wikitext."""
+    pattern = r"https?://[^\s\]\)\"']+"
+    return list(set(re.findall(pattern, text)))
+
+
+def parse_xml_file(xml_path: Path) -> Iterator[WikiArticle]:
+    """Parse a MediaWiki XML dump file and yield articles."""
+    context = etree.iterparse(
+        str(xml_path), events=("end",), tag="{http://www.mediawiki.org/xml/export-0.6/}page"
+    )
+
+    for event, page in context:
+        # Get basic info
+        title_elem = page.find("mw:title", MW_NS)
+        id_elem = page.find("mw:id", MW_NS)
+        ns_elem = page.find("mw:ns", MW_NS)
+
+        # Skip non-main namespace pages (talk, user, etc.)
+        if ns_elem is not None and ns_elem.text != "0":
+            page.clear()
+            continue
+
+        title = title_elem.text if title_elem is not None else ""
+        page_id = int(id_elem.text) if id_elem is not None else 0
+
+        # Get latest revision
+        revision = page.find("mw:revision", MW_NS)
+        if revision is None:
+            page.clear()
+            continue
+
+        text_elem = revision.find("mw:text", MW_NS)
+        timestamp_elem = revision.find("mw:timestamp", MW_NS)
+        contributor = revision.find("mw:contributor", MW_NS)
+
+        content = text_elem.text if text_elem is not None else ""
+        timestamp = timestamp_elem.text if timestamp_elem is not None else ""
+
+        contributor_name = ""
+        if contributor is not None:
+            username = contributor.find("mw:username", MW_NS)
+            if username is not None:
+                contributor_name = username.text or ""
+
+        # Skip redirects and empty pages
+        if not content or content.lower().startswith("#redirect"):
+            page.clear()
+            continue
+
+        article = WikiArticle(
+            id=page_id,
+            title=title,
+            content=content,
+            plain_text=clean_wikitext(content),
+            categories=extract_categories(content),
+            links=extract_wiki_links(content),
+            external_links=extract_external_links(content),
+            timestamp=timestamp,
+            contributor=contributor_name,
+        )
+
+        # Clear element to free memory
+        page.clear()
+
+        yield article
+
+
+def parse_all_dumps(output_path: Path | None = None) -> list[WikiArticle]:
+    """Parse all XML dump files and optionally save to JSON."""
+    xml_files = sorted(settings.xmldump_dir.glob("*.xml"))
+
+    if not xml_files:
+        console.print(f"[red]No XML files found in {settings.xmldump_dir}[/red]")
+        return []
+
+    console.print(f"[green]Found {len(xml_files)} XML files to parse[/green]")
+
+    all_articles = []
+
+    with Progress() as progress:
+        task = progress.add_task("[cyan]Parsing XML files...", total=len(xml_files))
+
+        for xml_file in xml_files:
+            progress.update(task, description=f"[cyan]Parsing {xml_file.name}...")
+
+            for article in parse_xml_file(xml_file):
+                all_articles.append(article)
+
+            progress.advance(task)
+
+    console.print(f"[green]Parsed {len(all_articles)} articles[/green]")
+
+    if output_path:
+        console.print(f"[cyan]Saving to {output_path}...[/cyan]")
+        with open(output_path, "w", encoding="utf-8") as f:
+            json.dump([a.to_dict() for a in all_articles], f, ensure_ascii=False, indent=2)
+        console.print(f"[green]Saved {len(all_articles)} articles to {output_path}[/green]")
+
+    return all_articles
+
+
+def parse_mediawiki_files(articles_dir: Path, output_path: Path | None = None) -> list[WikiArticle]:
+    """Parse individual .mediawiki files from a directory (Codeberg format)."""
+    mediawiki_files = list(articles_dir.glob("*.mediawiki"))
+
+    if not mediawiki_files:
+        console.print(f"[red]No .mediawiki files found in {articles_dir}[/red]")
+        return []
+
+    console.print(f"[green]Found {len(mediawiki_files)} .mediawiki files to parse[/green]")
+
+    all_articles = []
+
+    with Progress() as progress:
+        task = progress.add_task("[cyan]Parsing files...", total=len(mediawiki_files))
+
+        for i, filepath in enumerate(mediawiki_files):
+            # Title is the filename without extension
+            title = filepath.stem
+
+            try:
+                content = filepath.read_text(encoding="utf-8", errors="replace")
+            except Exception as e:
+                console.print(f"[yellow]Warning: Could not read {filepath}: {e}[/yellow]")
+                progress.advance(task)
+                continue
+
+            # Skip redirects and empty files
+            if not content or content.strip().lower().startswith("#redirect"):
+                progress.advance(task)
+                continue
+
+            article = WikiArticle(
+                id=i,
+                title=title,
+                content=content,
+                plain_text=clean_wikitext(content),
+                categories=extract_categories(content),
+                links=extract_wiki_links(content),
+                external_links=extract_external_links(content),
+                timestamp="",
+                contributor="",
+            )
+
+            all_articles.append(article)
+            progress.advance(task)
+
+    console.print(f"[green]Parsed {len(all_articles)} articles[/green]")
+
+    if output_path:
+        console.print(f"[cyan]Saving to {output_path}...[/cyan]")
+        with open(output_path, "w", encoding="utf-8") as f:
+            json.dump([a.to_dict() for a in all_articles], f, ensure_ascii=False, indent=2)
+        console.print(f"[green]Saved {len(all_articles)} articles to {output_path}[/green]")
+
+    return all_articles
+
+
+def main():
+    """CLI entry point for parsing wiki content."""
+    output_path = settings.data_dir / "articles.json"
+
+    # Check for Codeberg-style articles directory first (newer, more complete)
+    articles_dir = settings.project_root / "articles" / "articles"
+    if articles_dir.exists():
+        console.print("[cyan]Found Codeberg-style articles directory, using that...[/cyan]")
+        parse_mediawiki_files(articles_dir, output_path)
+    else:
+        # Fall back to XML dumps
+        parse_all_dumps(output_path)
+
+
+if __name__ == "__main__":
+    main()
--- a/src/rag.py
+++ b/src/rag.py
@ -0,0 +1,159 @@
+"""RAG (Retrieval Augmented Generation) system for wiki Q&A."""
+
+from dataclasses import dataclass
+from typing import Optional
+
+from .embeddings import WikiVectorStore
+from .llm import llm_client
+
+
+SYSTEM_PROMPT = """You are a knowledgeable assistant for the P2P Foundation Wiki, a comprehensive knowledge base about peer-to-peer culture, commons-based peer production, alternative economics, and collaborative governance.
+
+Your role is to answer questions about the wiki content accurately and helpfully. When answering:
+
+1. Base your answers on the provided wiki content excerpts
+2. Cite specific articles when relevant (use the article titles)
+3. If the provided content doesn't fully answer the question, say so
+4. Explain concepts in accessible language while maintaining accuracy
+5. Connect related concepts when helpful
+
+If asked about something not covered in the provided content, acknowledge this and suggest related topics that might be helpful."""
+
+
+@dataclass
+class ChatMessage:
+    """A chat message."""
+
+    role: str  # "user" or "assistant"
+    content: str
+
+
+@dataclass
+class RAGResponse:
+    """Response from the RAG system."""
+
+    answer: str
+    sources: list[dict]  # List of source articles used
+    query: str
+
+
+class WikiRAG:
+    """RAG system for answering questions about wiki content."""
+
+    def __init__(self, vector_store: Optional[WikiVectorStore] = None):
+        self.vector_store = vector_store or WikiVectorStore()
+        self.conversation_history: list[ChatMessage] = []
+
+    def _format_context(self, search_results: list[dict]) -> str:
+        """Format search results as context for the LLM."""
+        if not search_results:
+            return "No relevant wiki content found for this query."
+
+        context_parts = []
+        for i, result in enumerate(search_results, 1):
+            title = result["metadata"].get("title", "Unknown")
+            content = result["content"]
+            categories = result["metadata"].get("categories", "")
+
+            context_parts.append(
+                f"[Source {i}: {title}]\n"
+                f"Categories: {categories}\n"
+                f"Content:\n{content}\n"
+            )
+
+        return "\n---\n".join(context_parts)
+
+    def _build_prompt(self, query: str, context: str) -> str:
+        """Build the prompt for the LLM."""
+        # Include recent conversation history for context
+        history_text = ""
+        if self.conversation_history:
+            recent = self.conversation_history[-4:]  # Last 2 exchanges
+            history_text = "\n\nRecent conversation:\n"
+            for msg in recent:
+                role = "User" if msg.role == "user" else "Assistant"
+                # Truncate long messages
+                content = msg.content[:500] + "..." if len(msg.content) > 500 else msg.content
+                history_text += f"{role}: {content}\n"
+
+        return f"""Based on the following wiki content, please answer the user's question.
+
+Wiki Content:
+{context}
+{history_text}
+User Question: {query}
+
+Please provide a helpful answer based on the wiki content above. Cite specific articles when relevant."""
+
+    async def ask(
+        self,
+        query: str,
+        n_results: int = 5,
+        filter_categories: Optional[list[str]] = None,
+    ) -> RAGResponse:
+        """
+        Ask a question and get an answer based on wiki content.
+
+        Args:
+            query: User's question
+            n_results: Number of relevant chunks to retrieve
+            filter_categories: Optional category filter
+
+        Returns:
+            RAGResponse with answer and sources
+        """
+        # Search for relevant content
+        search_results = self.vector_store.search(
+            query, n_results=n_results, filter_categories=filter_categories
+        )
+
+        # Format context
+        context = self._format_context(search_results)
+
+        # Build prompt
+        prompt = self._build_prompt(query, context)
+
+        # Get LLM response (use Ollama for chat by default)
+        answer = await llm_client.chat(
+            prompt,
+            system=SYSTEM_PROMPT,
+            use_claude=False,  # Use Ollama for chat
+            temperature=0.7,
+        )
+
+        # Update conversation history
+        self.conversation_history.append(ChatMessage(role="user", content=query))
+        self.conversation_history.append(ChatMessage(role="assistant", content=answer))
+
+        # Extract unique sources
+        sources = []
+        seen_titles = set()
+        for result in search_results:
+            title = result["metadata"].get("title", "Unknown")
+            if title not in seen_titles:
+                seen_titles.add(title)
+                sources.append(
+                    {
+                        "title": title,
+                        "article_id": result["metadata"].get("article_id"),
+                        "categories": result["metadata"].get("categories", "").split(","),
+                    }
+                )
+
+        return RAGResponse(answer=answer, sources=sources, query=query)
+
+    def clear_history(self):
+        """Clear conversation history."""
+        self.conversation_history = []
+
+    def get_suggestions(self, partial_query: str, n_results: int = 5) -> list[str]:
+        """Get article title suggestions for autocomplete."""
+        # Simple prefix matching on titles
+        all_titles = self.vector_store.get_article_titles()
+        partial_lower = partial_query.lower()
+
+        suggestions = [
+            title for title in all_titles if partial_lower in title.lower()
+        ][:n_results]
+
+        return suggestions
--- a/web/index.html
+++ b/web/index.html
@ -0,0 +1,707 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>P2P Wiki AI</title>
+    <style>
+        :root {
+            --bg-primary: #1a1a2e;
+            --bg-secondary: #16213e;
+            --bg-tertiary: #0f3460;
+            --text-primary: #e8e8e8;
+            --text-secondary: #a0a0a0;
+            --accent: #e94560;
+            --accent-hover: #ff6b6b;
+            --success: #4ecdc4;
+            --border: #2a2a4a;
+        }
+
+        * {
+            box-sizing: border-box;
+            margin: 0;
+            padding: 0;
+        }
+
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
+            background: var(--bg-primary);
+            color: var(--text-primary);
+            min-height: 100vh;
+        }
+
+        .container {
+            max-width: 1200px;
+            margin: 0 auto;
+            padding: 20px;
+        }
+
+        header {
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            padding: 20px 0;
+            border-bottom: 1px solid var(--border);
+            margin-bottom: 30px;
+        }
+
+        h1 {
+            font-size: 1.8em;
+            font-weight: 600;
+        }
+
+        h1 span {
+            color: var(--accent);
+        }
+
+        .tabs {
+            display: flex;
+            gap: 10px;
+        }
+
+        .tab {
+            padding: 10px 20px;
+            background: var(--bg-secondary);
+            border: 1px solid var(--border);
+            border-radius: 8px;
+            cursor: pointer;
+            transition: all 0.2s;
+        }
+
+        .tab:hover, .tab.active {
+            background: var(--bg-tertiary);
+            border-color: var(--accent);
+        }
+
+        .panel {
+            display: none;
+        }
+
+        .panel.active {
+            display: block;
+        }
+
+        /* Chat Panel */
+        .chat-container {
+            display: flex;
+            flex-direction: column;
+            height: calc(100vh - 200px);
+            background: var(--bg-secondary);
+            border-radius: 12px;
+            overflow: hidden;
+        }
+
+        .chat-messages {
+            flex: 1;
+            overflow-y: auto;
+            padding: 20px;
+        }
+
+        .message {
+            margin-bottom: 20px;
+            max-width: 80%;
+        }
+
+        .message.user {
+            margin-left: auto;
+        }
+
+        .message-content {
+            padding: 15px;
+            border-radius: 12px;
+            line-height: 1.6;
+        }
+
+        .message.user .message-content {
+            background: var(--bg-tertiary);
+        }
+
+        .message.assistant .message-content {
+            background: var(--bg-primary);
+            border: 1px solid var(--border);
+        }
+
+        .message-sources {
+            margin-top: 10px;
+            padding: 10px;
+            background: rgba(233, 69, 96, 0.1);
+            border-radius: 8px;
+            font-size: 0.9em;
+        }
+
+        .message-sources h4 {
+            color: var(--accent);
+            margin-bottom: 5px;
+        }
+
+        .source-tag {
+            display: inline-block;
+            padding: 3px 8px;
+            margin: 2px;
+            background: var(--bg-tertiary);
+            border-radius: 4px;
+            font-size: 0.85em;
+        }
+
+        .chat-input {
+            display: flex;
+            gap: 10px;
+            padding: 20px;
+            background: var(--bg-primary);
+            border-top: 1px solid var(--border);
+        }
+
+        .chat-input input {
+            flex: 1;
+            padding: 15px;
+            background: var(--bg-secondary);
+            border: 1px solid var(--border);
+            border-radius: 8px;
+            color: var(--text-primary);
+            font-size: 1em;
+        }
+
+        .chat-input input:focus {
+            outline: none;
+            border-color: var(--accent);
+        }
+
+        .chat-input button {
+            padding: 15px 30px;
+            background: var(--accent);
+            border: none;
+            border-radius: 8px;
+            color: white;
+            font-weight: 600;
+            cursor: pointer;
+            transition: background 0.2s;
+        }
+
+        .chat-input button:hover {
+            background: var(--accent-hover);
+        }
+
+        .chat-input button:disabled {
+            opacity: 0.5;
+            cursor: not-allowed;
+        }
+
+        /* Ingress Panel */
+        .ingress-container {
+            background: var(--bg-secondary);
+            border-radius: 12px;
+            padding: 30px;
+        }
+
+        .ingress-form {
+            display: flex;
+            gap: 10px;
+            margin-bottom: 30px;
+        }
+
+        .ingress-form input {
+            flex: 1;
+            padding: 15px;
+            background: var(--bg-primary);
+            border: 1px solid var(--border);
+            border-radius: 8px;
+            color: var(--text-primary);
+            font-size: 1em;
+        }
+
+        .ingress-form input:focus {
+            outline: none;
+            border-color: var(--accent);
+        }
+
+        .ingress-form button {
+            padding: 15px 30px;
+            background: var(--success);
+            border: none;
+            border-radius: 8px;
+            color: var(--bg-primary);
+            font-weight: 600;
+            cursor: pointer;
+            transition: opacity 0.2s;
+        }
+
+        .ingress-form button:hover {
+            opacity: 0.9;
+        }
+
+        .ingress-form button:disabled {
+            opacity: 0.5;
+            cursor: not-allowed;
+        }
+
+        .ingress-result {
+            background: var(--bg-primary);
+            border-radius: 8px;
+            padding: 20px;
+            margin-bottom: 20px;
+        }
+
+        .ingress-result h3 {
+            margin-bottom: 15px;
+            color: var(--accent);
+        }
+
+        .result-stats {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
+            gap: 15px;
+            margin-bottom: 20px;
+        }
+
+        .stat {
+            background: var(--bg-secondary);
+            padding: 15px;
+            border-radius: 8px;
+            text-align: center;
+        }
+
+        .stat-value {
+            font-size: 2em;
+            font-weight: bold;
+            color: var(--success);
+        }
+
+        .stat-label {
+            color: var(--text-secondary);
+            font-size: 0.9em;
+        }
+
+        /* Review Panel */
+        .review-container {
+            background: var(--bg-secondary);
+            border-radius: 12px;
+            padding: 30px;
+        }
+
+        .review-item {
+            background: var(--bg-primary);
+            border-radius: 8px;
+            padding: 20px;
+            margin-bottom: 20px;
+        }
+
+        .review-item h3 {
+            margin-bottom: 10px;
+        }
+
+        .review-meta {
+            color: var(--text-secondary);
+            font-size: 0.9em;
+            margin-bottom: 15px;
+        }
+
+        .review-section {
+            margin-top: 20px;
+            padding-top: 20px;
+            border-top: 1px solid var(--border);
+        }
+
+        .review-section h4 {
+            margin-bottom: 10px;
+            color: var(--accent);
+        }
+
+        .match-item, .draft-item {
+            background: var(--bg-secondary);
+            padding: 15px;
+            border-radius: 8px;
+            margin-bottom: 10px;
+        }
+
+        .match-item .title, .draft-item .title {
+            font-weight: 600;
+            margin-bottom: 5px;
+        }
+
+        .match-item .score {
+            color: var(--success);
+        }
+
+        .action-buttons {
+            display: flex;
+            gap: 10px;
+            margin-top: 10px;
+        }
+
+        .btn-approve {
+            padding: 8px 16px;
+            background: var(--success);
+            border: none;
+            border-radius: 4px;
+            color: var(--bg-primary);
+            cursor: pointer;
+        }
+
+        .btn-reject {
+            padding: 8px 16px;
+            background: var(--accent);
+            border: none;
+            border-radius: 4px;
+            color: white;
+            cursor: pointer;
+        }
+
+        .loading {
+            display: inline-block;
+            width: 20px;
+            height: 20px;
+            border: 2px solid var(--text-secondary);
+            border-top-color: var(--accent);
+            border-radius: 50%;
+            animation: spin 1s linear infinite;
+        }
+
+        @keyframes spin {
+            to { transform: rotate(360deg); }
+        }
+
+        .empty-state {
+            text-align: center;
+            padding: 50px;
+            color: var(--text-secondary);
+        }
+
+        /* Markdown-like formatting */
+        .message-content p { margin-bottom: 10px; }
+        .message-content ul, .message-content ol { margin-left: 20px; margin-bottom: 10px; }
+        .message-content code { background: var(--bg-tertiary); padding: 2px 6px; border-radius: 4px; }
+        .message-content pre { background: var(--bg-tertiary); padding: 15px; border-radius: 8px; overflow-x: auto; }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <header>
+            <h1>P2P Wiki <span>AI</span></h1>
+            <div class="tabs">
+                <div class="tab active" data-panel="chat">Chat</div>
+                <div class="tab" data-panel="ingress">Ingress</div>
+                <div class="tab" data-panel="review">Review Queue</div>
+            </div>
+        </header>
+
+        <!-- Chat Panel -->
+        <div id="chat" class="panel active">
+            <div class="chat-container">
+                <div class="chat-messages" id="chatMessages">
+                    <div class="message assistant">
+                        <div class="message-content">
+                            <p>Welcome to the P2P Wiki AI assistant! I can help you explore the P2P Foundation Wiki's knowledge about peer-to-peer culture, commons-based peer production, alternative economics, and collaborative governance.</p>
+                            <p>Ask me anything about these topics!</p>
+                        </div>
+                    </div>
+                </div>
+                <div class="chat-input">
+                    <input type="text" id="chatInput" placeholder="Ask about P2P, commons, cooperative economics..." />
+                    <button id="chatSend">Send</button>
+                </div>
+            </div>
+        </div>
+
+        <!-- Ingress Panel -->
+        <div id="ingress" class="panel">
+            <div class="ingress-container">
+                <h2>Article Ingress</h2>
+                <p style="color: var(--text-secondary); margin-bottom: 20px;">
+                    Drop an article URL to analyze it for wiki content. The AI will identify relevant topics,
+                    find matching wiki articles for citations, and draft new articles.
+                </p>
+                <div class="ingress-form">
+                    <input type="url" id="ingressUrl" placeholder="https://example.com/article-about-commons" />
+                    <button id="ingressSubmit">Process Article</button>
+                </div>
+                <div id="ingressResult"></div>
+            </div>
+        </div>
+
+        <!-- Review Panel -->
+        <div id="review" class="panel">
+            <div class="review-container">
+                <h2>Review Queue</h2>
+                <p style="color: var(--text-secondary); margin-bottom: 20px;">
+                    Review and approve AI-generated wiki content before it's added to the wiki.
+                </p>
+                <div id="reviewItems">
+                    <div class="empty-state">Loading review items...</div>
+                </div>
+            </div>
+        </div>
+    </div>
+
+    <script>
+        const API_BASE = '';  // Same origin
+
+        // Tab switching
+        document.querySelectorAll('.tab').forEach(tab => {
+            tab.addEventListener('click', () => {
+                document.querySelectorAll('.tab').forEach(t => t.classList.remove('active'));
+                document.querySelectorAll('.panel').forEach(p => p.classList.remove('active'));
+                tab.classList.add('active');
+                document.getElementById(tab.dataset.panel).classList.add('active');
+
+                // Load review items when switching to review tab
+                if (tab.dataset.panel === 'review') {
+                    loadReviewItems();
+                }
+            });
+        });
+
+        // Chat functionality
+        const chatMessages = document.getElementById('chatMessages');
+        const chatInput = document.getElementById('chatInput');
+        const chatSend = document.getElementById('chatSend');
+
+        function addMessage(content, role, sources = []) {
+            const div = document.createElement('div');
+            div.className = `message ${role}`;
+
+            let html = `<div class="message-content">${formatMessage(content)}</div>`;
+
+            if (sources.length > 0) {
+                html += `<div class="message-sources">
+                    <h4>Sources</h4>
+                    ${sources.map(s => `<span class="source-tag">${s.title}</span>`).join('')}
+                </div>`;
+            }
+
+            div.innerHTML = html;
+            chatMessages.appendChild(div);
+            chatMessages.scrollTop = chatMessages.scrollHeight;
+        }
+
+        function formatMessage(text) {
+            // Basic markdown-like formatting
+            return text
+                .replace(/\n\n/g, '</p><p>')
+                .replace(/\n/g, '<br>')
+                .replace(/\*\*(.+?)\*\*/g, '<strong>$1</strong>')
+                .replace(/\*(.+?)\*/g, '<em>$1</em>')
+                .replace(/`(.+?)`/g, '<code>$1</code>');
+        }
+
+        async function sendChat() {
+            const query = chatInput.value.trim();
+            if (!query) return;
+
+            chatInput.value = '';
+            chatSend.disabled = true;
+
+            addMessage(query, 'user');
+
+            // Show loading
+            const loadingDiv = document.createElement('div');
+            loadingDiv.className = 'message assistant';
+            loadingDiv.innerHTML = '<div class="message-content"><span class="loading"></span> Thinking...</div>';
+            chatMessages.appendChild(loadingDiv);
+            chatMessages.scrollTop = chatMessages.scrollHeight;
+
+            try {
+                const response = await fetch(`${API_BASE}/chat`, {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify({ query, n_results: 5 })
+                });
+
+                const data = await response.json();
+
+                chatMessages.removeChild(loadingDiv);
+
+                if (response.ok) {
+                    addMessage(data.answer, 'assistant', data.sources);
+                } else {
+                    addMessage(`Error: ${data.detail || 'Something went wrong'}`, 'assistant');
+                }
+            } catch (error) {
+                chatMessages.removeChild(loadingDiv);
+                addMessage(`Error: ${error.message}`, 'assistant');
+            }
+
+            chatSend.disabled = false;
+            chatInput.focus();
+        }
+
+        chatSend.addEventListener('click', sendChat);
+        chatInput.addEventListener('keypress', (e) => {
+            if (e.key === 'Enter') sendChat();
+        });
+
+        // Ingress functionality
+        const ingressUrl = document.getElementById('ingressUrl');
+        const ingressSubmit = document.getElementById('ingressSubmit');
+        const ingressResult = document.getElementById('ingressResult');
+
+        async function processIngress() {
+            const url = ingressUrl.value.trim();
+            if (!url) return;
+
+            ingressSubmit.disabled = true;
+            ingressSubmit.textContent = 'Processing...';
+
+            ingressResult.innerHTML = `
+                <div class="ingress-result">
+                    <h3>Processing Article</h3>
+                    <p><span class="loading"></span> Scraping and analyzing content...</p>
+                </div>
+            `;
+
+            try {
+                const response = await fetch(`${API_BASE}/ingress`, {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify({ url })
+                });
+
+                const data = await response.json();
+
+                if (response.ok) {
+                    ingressResult.innerHTML = `
+                        <div class="ingress-result">
+                            <h3>Analysis Complete: ${data.scraped_title || 'Article'}</h3>
+                            <div class="result-stats">
+                                <div class="stat">
+                                    <div class="stat-value">${data.topics_found}</div>
+                                    <div class="stat-label">Topics Found</div>
+                                </div>
+                                <div class="stat">
+                                    <div class="stat-value">${data.wiki_matches}</div>
+                                    <div class="stat-label">Wiki Matches</div>
+                                </div>
+                                <div class="stat">
+                                    <div class="stat-value">${data.drafts_generated}</div>
+                                    <div class="stat-label">Drafts Generated</div>
+                                </div>
+                            </div>
+                            <p style="color: var(--success);">
+                                Results added to review queue. Check the Review tab to approve or reject suggestions.
+                            </p>
+                        </div>
+                    `;
+                } else {
+                    ingressResult.innerHTML = `
+                        <div class="ingress-result">
+                            <h3 style="color: var(--accent);">Error</h3>
+                            <p>${data.detail || 'Failed to process article'}</p>
+                        </div>
+                    `;
+                }
+            } catch (error) {
+                ingressResult.innerHTML = `
+                    <div class="ingress-result">
+                        <h3 style="color: var(--accent);">Error</h3>
+                        <p>${error.message}</p>
+                    </div>
+                `;
+            }
+
+            ingressSubmit.disabled = false;
+            ingressSubmit.textContent = 'Process Article';
+        }
+
+        ingressSubmit.addEventListener('click', processIngress);
+        ingressUrl.addEventListener('keypress', (e) => {
+            if (e.key === 'Enter') processIngress();
+        });
+
+        // Review functionality
+        const reviewItems = document.getElementById('reviewItems');
+
+        async function loadReviewItems() {
+            try {
+                const response = await fetch(`${API_BASE}/review`);
+                const data = await response.json();
+
+                if (data.count === 0) {
+                    reviewItems.innerHTML = '<div class="empty-state">No items in the review queue.</div>';
+                    return;
+                }
+
+                reviewItems.innerHTML = data.items.map(item => `
+                    <div class="review-item">
+                        <h3>${item.scraped?.title || 'Unknown Article'}</h3>
+                        <div class="review-meta">
+                            Source: <a href="${item.scraped?.url}" target="_blank">${item.scraped?.domain}</a>
+                            | Processed: ${new Date(item.timestamp).toLocaleString()}
+                        </div>
+
+                        ${item.wiki_matches?.length > 0 ? `
+                            <div class="review-section">
+                                <h4>Suggested Citations (${item.wiki_matches.length})</h4>
+                                ${item.wiki_matches.map((match, i) => `
+                                    <div class="match-item" ${match.approved ? 'style="opacity: 0.5"' : ''}>
+                                        <div class="title">${match.title}</div>
+                                        <div class="score">Relevance: ${(match.relevance_score * 100).toFixed(0)}%</div>
+                                        <div>${match.suggested_citation}</div>
+                                        ${!match.approved && !match.rejected ? `
+                                            <div class="action-buttons">
+                                                <button class="btn-approve" onclick="reviewAction('${item._filepath}', 'match', ${i}, 'approve')">Approve</button>
+                                                <button class="btn-reject" onclick="reviewAction('${item._filepath}', 'match', ${i}, 'reject')">Reject</button>
+                                            </div>
+                                        ` : `<em>${match.approved ? 'Approved' : 'Rejected'}</em>`}
+                                    </div>
+                                `).join('')}
+                            </div>
+                        ` : ''}
+
+                        ${item.draft_articles?.length > 0 ? `
+                            <div class="review-section">
+                                <h4>Draft Articles (${item.draft_articles.length})</h4>
+                                ${item.draft_articles.map((draft, i) => `
+                                    <div class="draft-item" ${draft.approved ? 'style="opacity: 0.5"' : ''}>
+                                        <div class="title">${draft.title}</div>
+                                        <div style="color: var(--text-secondary); font-size: 0.9em; margin-bottom: 10px;">
+                                            ${draft.summary || ''}
+                                        </div>
+                                        <details>
+                                            <summary style="cursor: pointer; color: var(--accent);">View Draft Content</summary>
+                                            <pre style="margin-top: 10px; white-space: pre-wrap; font-size: 0.85em;">${draft.content}</pre>
+                                        </details>
+                                        ${!draft.approved && !draft.rejected ? `
+                                            <div class="action-buttons">
+                                                <button class="btn-approve" onclick="reviewAction('${item._filepath}', 'draft', ${i}, 'approve')">Approve</button>
+                                                <button class="btn-reject" onclick="reviewAction('${item._filepath}', 'draft', ${i}, 'reject')">Reject</button>
+                                            </div>
+                                        ` : `<em>${draft.approved ? 'Approved' : 'Rejected'}</em>`}
+                                    </div>
+                                `).join('')}
+                            </div>
+                        ` : ''}
+                    </div>
+                `).join('');
+            } catch (error) {
+                reviewItems.innerHTML = `<div class="empty-state">Error loading review items: ${error.message}</div>`;
+            }
+        }
+
+        async function reviewAction(filepath, itemType, itemIndex, action) {
+            try {
+                const response = await fetch(`${API_BASE}/review/action`, {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify({
+                        filepath,
+                        item_type: itemType,
+                        item_index: itemIndex,
+                        action
+                    })
+                });
+
+                if (response.ok) {
+                    loadReviewItems();  // Refresh the list
+                } else {
+                    const data = await response.json();
+                    alert(`Error: ${data.detail || 'Action failed'}`);
+                }
+            } catch (error) {
+                alert(`Error: ${error.message}`);
+            }
+        }
+
+        // Make reviewAction available globally
+        window.reviewAction = reviewAction;
+    </script>
+</body>
+</html>
				`@ -0,0 +1 @@`
				`"""P2P Wiki AI System - Chat agent and ingress pipeline."""`