Initial commit: P2P Wiki AI system
- RAG-based chat with 39k wiki articles (232k chunks) - Article ingress pipeline for processing external URLs - Review queue for AI-generated content - FastAPI backend with web UI - Traefik-ready Docker setup for p2pwiki.jeffemmett.com Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
commit
4ebd90cc64
|
|
@ -0,0 +1,17 @@
|
||||||
|
# P2P Wiki AI Configuration
|
||||||
|
|
||||||
|
# Ollama (Local LLM)
|
||||||
|
OLLAMA_BASE_URL=http://localhost:11434
|
||||||
|
OLLAMA_MODEL=llama3.2
|
||||||
|
|
||||||
|
# Claude API (Optional - for higher quality article drafts)
|
||||||
|
ANTHROPIC_API_KEY=
|
||||||
|
CLAUDE_MODEL=claude-sonnet-4-20250514
|
||||||
|
|
||||||
|
# Hybrid Routing
|
||||||
|
USE_CLAUDE_FOR_DRAFTS=true
|
||||||
|
USE_OLLAMA_FOR_CHAT=true
|
||||||
|
|
||||||
|
# Server
|
||||||
|
HOST=0.0.0.0
|
||||||
|
PORT=8420
|
||||||
|
|
@ -0,0 +1,28 @@
|
||||||
|
# Virtual environment
|
||||||
|
.venv/
|
||||||
|
venv/
|
||||||
|
env/
|
||||||
|
|
||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*.egg-info/
|
||||||
|
dist/
|
||||||
|
build/
|
||||||
|
|
||||||
|
# Data files (too large for git)
|
||||||
|
data/articles.json
|
||||||
|
data/chroma/
|
||||||
|
data/review_queue/
|
||||||
|
xmldump/
|
||||||
|
xmldump-2014.tar.gz
|
||||||
|
articles/
|
||||||
|
articles.tar.gz
|
||||||
|
|
||||||
|
# Environment
|
||||||
|
.env
|
||||||
|
|
||||||
|
# IDE
|
||||||
|
.idea/
|
||||||
|
.vscode/
|
||||||
|
*.swp
|
||||||
|
|
@ -0,0 +1,49 @@
|
||||||
|
# P2P Wiki AI - Multi-stage build
|
||||||
|
FROM python:3.11-slim as builder
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Install build dependencies
|
||||||
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||||
|
build-essential \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
# Install Python dependencies
|
||||||
|
COPY pyproject.toml .
|
||||||
|
RUN pip install --no-cache-dir build && \
|
||||||
|
pip wheel --no-cache-dir --wheel-dir /wheels .
|
||||||
|
|
||||||
|
# Production image
|
||||||
|
FROM python:3.11-slim
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Install runtime dependencies
|
||||||
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||||
|
libxml2 \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
# Copy wheels and install
|
||||||
|
COPY --from=builder /wheels /wheels
|
||||||
|
RUN pip install --no-cache-dir /wheels/*.whl && rm -rf /wheels
|
||||||
|
|
||||||
|
# Copy application code
|
||||||
|
COPY src/ src/
|
||||||
|
COPY web/ web/
|
||||||
|
|
||||||
|
# Create data directories
|
||||||
|
RUN mkdir -p data/chroma data/review_queue
|
||||||
|
|
||||||
|
# Environment variables
|
||||||
|
ENV PYTHONUNBUFFERED=1
|
||||||
|
ENV PYTHONDONTWRITEBYTECODE=1
|
||||||
|
|
||||||
|
# Expose port
|
||||||
|
EXPOSE 8420
|
||||||
|
|
||||||
|
# Health check
|
||||||
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||||
|
CMD python -c "import httpx; httpx.get('http://localhost:8420/health')" || exit 1
|
||||||
|
|
||||||
|
# Run the application
|
||||||
|
CMD ["python", "-m", "uvicorn", "src.api:app", "--host", "0.0.0.0", "--port", "8420"]
|
||||||
|
|
@ -0,0 +1,199 @@
|
||||||
|
# P2P Wiki AI
|
||||||
|
|
||||||
|
AI-augmented system for the P2P Foundation Wiki with two main features:
|
||||||
|
|
||||||
|
1. **Conversational Agent** - Ask questions about the 23,000+ wiki articles using RAG (Retrieval Augmented Generation)
|
||||||
|
2. **Article Ingress Pipeline** - Drop article URLs to automatically analyze content, find matching wiki articles for citations, and generate draft articles
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ P2P Wiki AI System │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────┐ ┌─────────────────┐ │
|
||||||
|
│ │ Chat (Q&A) │ │ Ingress Tool │ │
|
||||||
|
│ │ via RAG │ │ (URL Drop) │ │
|
||||||
|
│ └────────┬────────┘ └────────┬────────┘ │
|
||||||
|
│ │ │ │
|
||||||
|
│ └───────────┬───────────┘ │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌───────────────────────┐ │
|
||||||
|
│ │ FastAPI Backend │ │
|
||||||
|
│ └───────────┬───────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ ┌──────────────┼──────────────┐ │
|
||||||
|
│ ▼ ▼ ▼ │
|
||||||
|
│ ┌──────────┐ ┌─────────────┐ ┌──────────────┐ │
|
||||||
|
│ │ ChromaDB │ │ Ollama/ │ │ Article │ │
|
||||||
|
│ │ (Vector) │ │ Claude │ │ Scraper │ │
|
||||||
|
│ └──────────┘ └─────────────┘ └──────────────┘ │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Prerequisites
|
||||||
|
|
||||||
|
- Python 3.10+
|
||||||
|
- [Ollama](https://ollama.ai) installed locally (or access to a remote Ollama server)
|
||||||
|
- Optional: Anthropic API key for Claude (higher quality article drafts)
|
||||||
|
|
||||||
|
### 2. Install Dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /home/jeffe/Github/p2pwiki-content
|
||||||
|
pip install -e .
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Parse Wiki Content
|
||||||
|
|
||||||
|
Convert the MediaWiki XML dumps to searchable JSON:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m src.parser
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates `data/articles.json` with all parsed articles (~23,000 pages).
|
||||||
|
|
||||||
|
### 4. Generate Embeddings
|
||||||
|
|
||||||
|
Create the vector store for semantic search:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m src.embeddings
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates the ChromaDB vector store in `data/chroma/`. Takes a few minutes.
|
||||||
|
|
||||||
|
### 5. Configure Environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env with your settings
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Run the Server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m src.api
|
||||||
|
```
|
||||||
|
|
||||||
|
Visit http://localhost:8420/ui for the web interface.
|
||||||
|
|
||||||
|
## Docker Deployment
|
||||||
|
|
||||||
|
For production deployment on the RS 8000:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build and run
|
||||||
|
docker compose up -d --build
|
||||||
|
|
||||||
|
# Check logs
|
||||||
|
docker compose logs -f
|
||||||
|
|
||||||
|
# Access at http://localhost:8420/ui
|
||||||
|
# Or via Traefik at https://wiki-ai.jeffemmett.com
|
||||||
|
```
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
### Chat
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Ask a question
|
||||||
|
curl -X POST http://localhost:8420/chat \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"query": "What is commons-based peer production?"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Ingress
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Process an external article
|
||||||
|
curl -X POST http://localhost:8420/ingress \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"url": "https://example.com/article-about-cooperatives"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Review Queue
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get all items in review queue
|
||||||
|
curl http://localhost:8420/review
|
||||||
|
|
||||||
|
# Approve a draft article
|
||||||
|
curl -X POST http://localhost:8420/review/action \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"filepath": "/path/to/item.json", "item_type": "draft", "item_index": 0, "action": "approve"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Search
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Direct vector search
|
||||||
|
curl "http://localhost:8420/search?q=cooperative%20economics&n=10"
|
||||||
|
|
||||||
|
# List article titles
|
||||||
|
curl "http://localhost:8420/articles?limit=100"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Hybrid AI Routing
|
||||||
|
|
||||||
|
The system uses intelligent routing between local (Ollama) and cloud (Claude) LLMs:
|
||||||
|
|
||||||
|
| Task | Default LLM | Reasoning |
|
||||||
|
|------|-------------|-----------|
|
||||||
|
| Chat Q&A | Ollama | Fast, free, good enough for retrieval-based answers |
|
||||||
|
| Content Analysis | Claude | Better at extracting topics and identifying wiki relevance |
|
||||||
|
| Draft Generation | Claude | Higher quality article writing |
|
||||||
|
| Embeddings | Local (sentence-transformers) | Fast, free, optimized for semantic search |
|
||||||
|
|
||||||
|
Configure in `.env`:
|
||||||
|
```
|
||||||
|
USE_CLAUDE_FOR_DRAFTS=true
|
||||||
|
USE_OLLAMA_FOR_CHAT=true
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
p2pwiki-content/
|
||||||
|
├── src/
|
||||||
|
│ ├── api.py # FastAPI backend
|
||||||
|
│ ├── config.py # Configuration settings
|
||||||
|
│ ├── embeddings.py # Vector store (ChromaDB)
|
||||||
|
│ ├── ingress.py # Article scraper & analyzer
|
||||||
|
│ ├── llm.py # LLM client (Ollama/Claude)
|
||||||
|
│ ├── parser.py # MediaWiki XML parser
|
||||||
|
│ └── rag.py # RAG chat system
|
||||||
|
├── web/
|
||||||
|
│ └── index.html # Web UI
|
||||||
|
├── data/
|
||||||
|
│ ├── articles.json # Parsed wiki content
|
||||||
|
│ ├── chroma/ # Vector store
|
||||||
|
│ └── review_queue/ # Pending ingress items
|
||||||
|
├── xmldump/ # MediaWiki XML dumps
|
||||||
|
├── docker-compose.yml
|
||||||
|
├── Dockerfile
|
||||||
|
└── pyproject.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Content Coverage
|
||||||
|
|
||||||
|
The P2P Foundation Wiki contains ~23,000 articles covering:
|
||||||
|
|
||||||
|
- Peer-to-peer networks and culture
|
||||||
|
- Commons-based peer production (CBPP)
|
||||||
|
- Alternative economics and post-capitalism
|
||||||
|
- Cooperative business models
|
||||||
|
- Open source and free culture
|
||||||
|
- Collaborative governance
|
||||||
|
- Sustainability and ecology
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
The wiki content is from the P2P Foundation under their respective licenses.
|
||||||
|
The AI system code is provided as-is for educational purposes.
|
||||||
|
|
@ -0,0 +1,38 @@
|
||||||
|
version: '3.8'
|
||||||
|
|
||||||
|
services:
|
||||||
|
p2pwiki-ai:
|
||||||
|
build: .
|
||||||
|
container_name: p2pwiki-ai
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- "8420:8420"
|
||||||
|
volumes:
|
||||||
|
# Persist vector store and review queue
|
||||||
|
- ./data:/app/data
|
||||||
|
# Mount XML dumps for parsing (read-only)
|
||||||
|
- ./xmldump:/app/xmldump:ro
|
||||||
|
environment:
|
||||||
|
# Ollama connection (adjust host for your setup)
|
||||||
|
- OLLAMA_BASE_URL=${OLLAMA_BASE_URL:-http://host.docker.internal:11434}
|
||||||
|
- OLLAMA_MODEL=${OLLAMA_MODEL:-llama3.2}
|
||||||
|
# Claude API (optional, for higher quality drafts)
|
||||||
|
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
|
||||||
|
- CLAUDE_MODEL=${CLAUDE_MODEL:-claude-sonnet-4-20250514}
|
||||||
|
# Hybrid routing settings
|
||||||
|
- USE_CLAUDE_FOR_DRAFTS=${USE_CLAUDE_FOR_DRAFTS:-true}
|
||||||
|
- USE_OLLAMA_FOR_CHAT=${USE_OLLAMA_FOR_CHAT:-true}
|
||||||
|
labels:
|
||||||
|
# Traefik labels for reverse proxy
|
||||||
|
- "traefik.enable=true"
|
||||||
|
- "traefik.http.routers.p2pwiki-ai.rule=Host(`p2pwiki.jeffemmett.com`)"
|
||||||
|
- "traefik.http.services.p2pwiki-ai.loadbalancer.server.port=8420"
|
||||||
|
networks:
|
||||||
|
- traefik-public
|
||||||
|
# Add extra_hosts for Docker Desktop to access host services
|
||||||
|
extra_hosts:
|
||||||
|
- "host.docker.internal:host-gateway"
|
||||||
|
|
||||||
|
networks:
|
||||||
|
traefik-public:
|
||||||
|
external: true
|
||||||
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,64 @@
|
||||||
|
[project]
|
||||||
|
name = "p2pwiki-ai"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "AI-augmented system for P2P Foundation Wiki - chat agent and ingress pipeline"
|
||||||
|
requires-python = ">=3.10"
|
||||||
|
dependencies = [
|
||||||
|
# Core
|
||||||
|
"fastapi>=0.109.0",
|
||||||
|
"uvicorn[standard]>=0.27.0",
|
||||||
|
"pydantic>=2.5.0",
|
||||||
|
"pydantic-settings>=2.1.0",
|
||||||
|
|
||||||
|
# XML parsing
|
||||||
|
"lxml>=5.1.0",
|
||||||
|
|
||||||
|
# Vector store & embeddings
|
||||||
|
"chromadb>=0.4.22",
|
||||||
|
"sentence-transformers>=2.3.0",
|
||||||
|
|
||||||
|
# LLM integration
|
||||||
|
"openai>=1.10.0", # For Ollama-compatible API
|
||||||
|
"anthropic>=0.18.0", # For Claude API
|
||||||
|
"httpx>=0.26.0",
|
||||||
|
|
||||||
|
# Article scraping
|
||||||
|
"trafilatura>=1.6.0",
|
||||||
|
"newspaper3k>=0.2.8",
|
||||||
|
"beautifulsoup4>=4.12.0",
|
||||||
|
"requests>=2.31.0",
|
||||||
|
|
||||||
|
# Utilities
|
||||||
|
"python-dotenv>=1.0.0",
|
||||||
|
"rich>=13.7.0",
|
||||||
|
"tqdm>=4.66.0",
|
||||||
|
"tenacity>=8.2.0",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
dev = [
|
||||||
|
"pytest>=7.4.0",
|
||||||
|
"pytest-asyncio>=0.23.0",
|
||||||
|
"black>=24.1.0",
|
||||||
|
"ruff>=0.1.0",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
p2pwiki-parse = "src.parser:main"
|
||||||
|
p2pwiki-embed = "src.embeddings:main"
|
||||||
|
p2pwiki-serve = "src.api:main"
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=68.0", "wheel"]
|
||||||
|
build-backend = "setuptools.build_meta"
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
where = ["."]
|
||||||
|
|
||||||
|
[tool.black]
|
||||||
|
line-length = 100
|
||||||
|
target-version = ["py310"]
|
||||||
|
|
||||||
|
[tool.ruff]
|
||||||
|
line-length = 100
|
||||||
|
select = ["E", "F", "I", "N", "W"]
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
"""P2P Wiki AI System - Chat agent and ingress pipeline."""
|
||||||
|
|
@ -0,0 +1,320 @@
|
||||||
|
"""FastAPI backend for P2P Wiki AI system."""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from fastapi import FastAPI, HTTPException, BackgroundTasks
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
from fastapi.staticfiles import StaticFiles
|
||||||
|
from fastapi.responses import FileResponse
|
||||||
|
from pydantic import BaseModel, HttpUrl
|
||||||
|
|
||||||
|
from .config import settings
|
||||||
|
from .embeddings import WikiVectorStore
|
||||||
|
from .rag import WikiRAG, RAGResponse
|
||||||
|
from .ingress import IngressPipeline, get_review_queue, approve_item, reject_item
|
||||||
|
|
||||||
|
# Global instances
|
||||||
|
vector_store: Optional[WikiVectorStore] = None
|
||||||
|
rag_system: Optional[WikiRAG] = None
|
||||||
|
ingress_pipeline: Optional[IngressPipeline] = None
|
||||||
|
|
||||||
|
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app: FastAPI):
|
||||||
|
"""Initialize services on startup."""
|
||||||
|
global vector_store, rag_system, ingress_pipeline
|
||||||
|
|
||||||
|
print("Initializing P2P Wiki AI system...")
|
||||||
|
|
||||||
|
# Check if vector store has been populated
|
||||||
|
chroma_path = settings.chroma_persist_dir
|
||||||
|
if not chroma_path.exists() or not any(chroma_path.iterdir()):
|
||||||
|
print("WARNING: Vector store not initialized. Run 'python -m src.parser' and 'python -m src.embeddings' first.")
|
||||||
|
else:
|
||||||
|
vector_store = WikiVectorStore()
|
||||||
|
rag_system = WikiRAG(vector_store)
|
||||||
|
ingress_pipeline = IngressPipeline(vector_store)
|
||||||
|
print(f"Loaded vector store with {vector_store.get_stats()['total_chunks']} chunks")
|
||||||
|
|
||||||
|
yield
|
||||||
|
|
||||||
|
print("Shutting down...")
|
||||||
|
|
||||||
|
|
||||||
|
app = FastAPI(
|
||||||
|
title="P2P Wiki AI",
|
||||||
|
description="AI-augmented system for P2P Foundation Wiki - chat agent and ingress pipeline",
|
||||||
|
version="0.1.0",
|
||||||
|
lifespan=lifespan,
|
||||||
|
)
|
||||||
|
|
||||||
|
# CORS middleware
|
||||||
|
app.add_middleware(
|
||||||
|
CORSMiddleware,
|
||||||
|
allow_origins=["*"], # Configure appropriately for production
|
||||||
|
allow_credentials=True,
|
||||||
|
allow_methods=["*"],
|
||||||
|
allow_headers=["*"],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# --- Request/Response Models ---
|
||||||
|
|
||||||
|
|
||||||
|
class ChatRequest(BaseModel):
|
||||||
|
"""Chat request model."""
|
||||||
|
|
||||||
|
query: str
|
||||||
|
n_results: int = 5
|
||||||
|
filter_categories: Optional[list[str]] = None
|
||||||
|
|
||||||
|
|
||||||
|
class ChatResponse(BaseModel):
|
||||||
|
"""Chat response model."""
|
||||||
|
|
||||||
|
answer: str
|
||||||
|
sources: list[dict]
|
||||||
|
query: str
|
||||||
|
|
||||||
|
|
||||||
|
class IngressRequest(BaseModel):
|
||||||
|
"""Ingress request model."""
|
||||||
|
|
||||||
|
url: HttpUrl
|
||||||
|
|
||||||
|
|
||||||
|
class IngressResponse(BaseModel):
|
||||||
|
"""Ingress response model."""
|
||||||
|
|
||||||
|
status: str
|
||||||
|
message: str
|
||||||
|
scraped_title: Optional[str] = None
|
||||||
|
topics_found: int = 0
|
||||||
|
wiki_matches: int = 0
|
||||||
|
drafts_generated: int = 0
|
||||||
|
queue_file: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class ReviewActionRequest(BaseModel):
|
||||||
|
"""Review action request model."""
|
||||||
|
|
||||||
|
filepath: str
|
||||||
|
item_type: str # "match" or "draft"
|
||||||
|
item_index: int
|
||||||
|
action: str # "approve" or "reject"
|
||||||
|
|
||||||
|
|
||||||
|
# --- API Endpoints ---
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/")
|
||||||
|
async def root():
|
||||||
|
"""Root endpoint."""
|
||||||
|
return {
|
||||||
|
"name": "P2P Wiki AI",
|
||||||
|
"version": "0.1.0",
|
||||||
|
"status": "running",
|
||||||
|
"vector_store_ready": vector_store is not None,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health():
|
||||||
|
"""Health check endpoint."""
|
||||||
|
return {
|
||||||
|
"status": "healthy",
|
||||||
|
"vector_store_ready": vector_store is not None,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/stats")
|
||||||
|
async def stats():
|
||||||
|
"""Get system statistics."""
|
||||||
|
if not vector_store:
|
||||||
|
return {"error": "Vector store not initialized"}
|
||||||
|
|
||||||
|
return {
|
||||||
|
"vector_store": vector_store.get_stats(),
|
||||||
|
"review_queue_count": len(get_review_queue()),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# --- Chat Endpoints ---
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/chat", response_model=ChatResponse)
|
||||||
|
async def chat(request: ChatRequest):
|
||||||
|
"""Chat with the wiki knowledge base."""
|
||||||
|
if not rag_system:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=503,
|
||||||
|
detail="RAG system not initialized. Run indexing first.",
|
||||||
|
)
|
||||||
|
|
||||||
|
response = await rag_system.ask(
|
||||||
|
query=request.query,
|
||||||
|
n_results=request.n_results,
|
||||||
|
filter_categories=request.filter_categories,
|
||||||
|
)
|
||||||
|
|
||||||
|
return ChatResponse(
|
||||||
|
answer=response.answer,
|
||||||
|
sources=response.sources,
|
||||||
|
query=response.query,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/chat/clear")
|
||||||
|
async def clear_chat():
|
||||||
|
"""Clear chat history."""
|
||||||
|
if rag_system:
|
||||||
|
rag_system.clear_history()
|
||||||
|
return {"status": "cleared"}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/chat/suggestions")
|
||||||
|
async def chat_suggestions(q: str = ""):
|
||||||
|
"""Get article title suggestions for autocomplete."""
|
||||||
|
if not rag_system or not q:
|
||||||
|
return {"suggestions": []}
|
||||||
|
|
||||||
|
suggestions = rag_system.get_suggestions(q)
|
||||||
|
return {"suggestions": suggestions}
|
||||||
|
|
||||||
|
|
||||||
|
# --- Ingress Endpoints ---
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/ingress", response_model=IngressResponse)
|
||||||
|
async def ingress(request: IngressRequest, background_tasks: BackgroundTasks):
|
||||||
|
"""
|
||||||
|
Process an external article URL through the ingress pipeline.
|
||||||
|
|
||||||
|
This scrapes the article, analyzes it for wiki relevance,
|
||||||
|
finds matching existing articles, and generates draft articles.
|
||||||
|
"""
|
||||||
|
if not ingress_pipeline:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=503,
|
||||||
|
detail="Ingress pipeline not initialized. Run indexing first.",
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = await ingress_pipeline.process(str(request.url))
|
||||||
|
|
||||||
|
return IngressResponse(
|
||||||
|
status="success",
|
||||||
|
message="Article processed successfully",
|
||||||
|
scraped_title=result.scraped.title,
|
||||||
|
topics_found=len(result.analysis.get("main_topics", [])),
|
||||||
|
wiki_matches=len(result.wiki_matches),
|
||||||
|
drafts_generated=len(result.draft_articles),
|
||||||
|
queue_file=result.timestamp,
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(status_code=500, detail=str(e))
|
||||||
|
|
||||||
|
|
||||||
|
# --- Review Queue Endpoints ---
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/review")
|
||||||
|
async def get_review_items():
|
||||||
|
"""Get all items in the review queue."""
|
||||||
|
items = get_review_queue()
|
||||||
|
return {"count": len(items), "items": items}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/review/{filename}")
|
||||||
|
async def get_review_item(filename: str):
|
||||||
|
"""Get a specific review item."""
|
||||||
|
filepath = settings.review_queue_dir / filename
|
||||||
|
if not filepath.exists():
|
||||||
|
raise HTTPException(status_code=404, detail="Review item not found")
|
||||||
|
|
||||||
|
import json
|
||||||
|
|
||||||
|
with open(filepath, "r", encoding="utf-8") as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
return data
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/review/action")
|
||||||
|
async def review_action(request: ReviewActionRequest):
|
||||||
|
"""Approve or reject a review item."""
|
||||||
|
if request.action == "approve":
|
||||||
|
success = approve_item(request.filepath, request.item_type, request.item_index)
|
||||||
|
elif request.action == "reject":
|
||||||
|
success = reject_item(request.filepath, request.item_type, request.item_index)
|
||||||
|
else:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid action")
|
||||||
|
|
||||||
|
if success:
|
||||||
|
return {"status": "success", "action": request.action}
|
||||||
|
else:
|
||||||
|
raise HTTPException(status_code=500, detail="Action failed")
|
||||||
|
|
||||||
|
|
||||||
|
# --- Search Endpoints ---
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/search")
|
||||||
|
async def search(q: str, n: int = 10, categories: Optional[str] = None):
|
||||||
|
"""Direct search of the vector store."""
|
||||||
|
if not vector_store:
|
||||||
|
raise HTTPException(status_code=503, detail="Vector store not initialized")
|
||||||
|
|
||||||
|
filter_cats = categories.split(",") if categories else None
|
||||||
|
results = vector_store.search(q, n_results=n, filter_categories=filter_cats)
|
||||||
|
|
||||||
|
return {"query": q, "count": len(results), "results": results}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/articles")
|
||||||
|
async def list_articles(limit: int = 100, offset: int = 0):
|
||||||
|
"""List article titles."""
|
||||||
|
if not vector_store:
|
||||||
|
raise HTTPException(status_code=503, detail="Vector store not initialized")
|
||||||
|
|
||||||
|
titles = vector_store.get_article_titles()
|
||||||
|
return {
|
||||||
|
"total": len(titles),
|
||||||
|
"limit": limit,
|
||||||
|
"offset": offset,
|
||||||
|
"titles": titles[offset : offset + limit],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# --- Static Files (Web UI) ---
|
||||||
|
|
||||||
|
web_dir = Path(__file__).parent.parent / "web"
|
||||||
|
if web_dir.exists():
|
||||||
|
app.mount("/static", StaticFiles(directory=str(web_dir)), name="static")
|
||||||
|
|
||||||
|
@app.get("/ui")
|
||||||
|
async def ui():
|
||||||
|
"""Serve the web UI."""
|
||||||
|
index_path = web_dir / "index.html"
|
||||||
|
if index_path.exists():
|
||||||
|
return FileResponse(index_path)
|
||||||
|
raise HTTPException(status_code=404, detail="Web UI not found")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Run the API server."""
|
||||||
|
import uvicorn
|
||||||
|
|
||||||
|
uvicorn.run(
|
||||||
|
"src.api:app",
|
||||||
|
host=settings.host,
|
||||||
|
port=settings.port,
|
||||||
|
reload=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,51 @@
|
||||||
|
"""Configuration settings for P2P Wiki AI system."""
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
from pydantic_settings import BaseSettings
|
||||||
|
|
||||||
|
|
||||||
|
class Settings(BaseSettings):
|
||||||
|
"""Application settings loaded from environment variables."""
|
||||||
|
|
||||||
|
# Paths
|
||||||
|
project_root: Path = Path(__file__).parent.parent
|
||||||
|
data_dir: Path = project_root / "data"
|
||||||
|
xmldump_dir: Path = project_root / "xmldump"
|
||||||
|
|
||||||
|
# Vector store
|
||||||
|
chroma_persist_dir: Path = data_dir / "chroma"
|
||||||
|
embedding_model: str = "all-MiniLM-L6-v2" # Fast, good quality
|
||||||
|
|
||||||
|
# Ollama (local LLM)
|
||||||
|
ollama_base_url: str = "http://localhost:11434"
|
||||||
|
ollama_model: str = "llama3.2" # Default model for local inference
|
||||||
|
|
||||||
|
# Claude API (for complex tasks)
|
||||||
|
anthropic_api_key: str = ""
|
||||||
|
claude_model: str = "claude-sonnet-4-20250514"
|
||||||
|
|
||||||
|
# Hybrid routing thresholds
|
||||||
|
use_claude_for_drafts: bool = True # Use Claude for article drafting
|
||||||
|
use_ollama_for_chat: bool = True # Use Ollama for simple Q&A
|
||||||
|
|
||||||
|
# MediaWiki
|
||||||
|
mediawiki_api_url: str = "" # Set if you have a live wiki API
|
||||||
|
|
||||||
|
# Server
|
||||||
|
host: str = "0.0.0.0"
|
||||||
|
port: int = 8420
|
||||||
|
|
||||||
|
# Review queue
|
||||||
|
review_queue_dir: Path = data_dir / "review_queue"
|
||||||
|
|
||||||
|
class Config:
|
||||||
|
env_file = ".env"
|
||||||
|
env_file_encoding = "utf-8"
|
||||||
|
|
||||||
|
|
||||||
|
settings = Settings()
|
||||||
|
|
||||||
|
# Ensure directories exist
|
||||||
|
settings.data_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
settings.chroma_persist_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
settings.review_queue_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
@ -0,0 +1,256 @@
|
||||||
|
"""Vector store setup and embedding generation using ChromaDB."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import chromadb
|
||||||
|
from chromadb.config import Settings as ChromaSettings
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.progress import Progress
|
||||||
|
from sentence_transformers import SentenceTransformer
|
||||||
|
|
||||||
|
from .config import settings
|
||||||
|
from .parser import WikiArticle
|
||||||
|
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# Chunk size for embedding (in characters)
|
||||||
|
CHUNK_SIZE = 1000
|
||||||
|
CHUNK_OVERLAP = 200
|
||||||
|
|
||||||
|
|
||||||
|
class WikiVectorStore:
|
||||||
|
"""Vector store for wiki articles using ChromaDB."""
|
||||||
|
|
||||||
|
def __init__(self, persist_dir: Optional[Path] = None):
|
||||||
|
self.persist_dir = persist_dir or settings.chroma_persist_dir
|
||||||
|
|
||||||
|
# Initialize ChromaDB
|
||||||
|
self.client = chromadb.PersistentClient(
|
||||||
|
path=str(self.persist_dir),
|
||||||
|
settings=ChromaSettings(anonymized_telemetry=False),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create or get collection
|
||||||
|
self.collection = self.client.get_or_create_collection(
|
||||||
|
name="wiki_articles",
|
||||||
|
metadata={"hnsw:space": "cosine"},
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load embedding model
|
||||||
|
console.print(f"[cyan]Loading embedding model: {settings.embedding_model}[/cyan]")
|
||||||
|
self.model = SentenceTransformer(settings.embedding_model)
|
||||||
|
console.print("[green]Model loaded[/green]")
|
||||||
|
|
||||||
|
def _chunk_text(self, text: str, title: str) -> list[tuple[str, dict]]:
|
||||||
|
"""Split text into overlapping chunks with metadata."""
|
||||||
|
if len(text) <= CHUNK_SIZE:
|
||||||
|
return [(text, {"chunk_index": 0, "total_chunks": 1})]
|
||||||
|
|
||||||
|
chunks = []
|
||||||
|
start = 0
|
||||||
|
chunk_index = 0
|
||||||
|
|
||||||
|
while start < len(text):
|
||||||
|
end = start + CHUNK_SIZE
|
||||||
|
|
||||||
|
# Try to break at sentence boundary
|
||||||
|
if end < len(text):
|
||||||
|
# Look for sentence end within last 100 chars
|
||||||
|
for i in range(min(100, end - start)):
|
||||||
|
if text[end - i] in ".!?\n":
|
||||||
|
end = end - i + 1
|
||||||
|
break
|
||||||
|
|
||||||
|
chunk_text = text[start:end].strip()
|
||||||
|
if chunk_text:
|
||||||
|
# Prepend title for context
|
||||||
|
chunk_with_title = f"{title}\n\n{chunk_text}"
|
||||||
|
chunks.append(
|
||||||
|
(chunk_with_title, {"chunk_index": chunk_index, "total_chunks": -1})
|
||||||
|
)
|
||||||
|
chunk_index += 1
|
||||||
|
|
||||||
|
start = end - CHUNK_OVERLAP
|
||||||
|
|
||||||
|
# Update total_chunks
|
||||||
|
for i, (text, meta) in enumerate(chunks):
|
||||||
|
meta["total_chunks"] = len(chunks)
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
def get_embedded_article_ids(self) -> set:
|
||||||
|
"""Get set of article IDs that are already embedded."""
|
||||||
|
results = self.collection.get(include=["metadatas"])
|
||||||
|
article_ids = set()
|
||||||
|
for meta in results["metadatas"]:
|
||||||
|
if meta and "article_id" in meta:
|
||||||
|
article_ids.add(meta["article_id"])
|
||||||
|
return article_ids
|
||||||
|
|
||||||
|
def add_articles(self, articles: list[WikiArticle], batch_size: int = 100, resume: bool = True):
|
||||||
|
"""Add articles to the vector store."""
|
||||||
|
console.print(f"[cyan]Processing {len(articles)} articles...[/cyan]")
|
||||||
|
|
||||||
|
# Check for already embedded articles if resuming
|
||||||
|
if resume:
|
||||||
|
embedded_ids = self.get_embedded_article_ids()
|
||||||
|
original_count = len(articles)
|
||||||
|
articles = [a for a in articles if a.id not in embedded_ids]
|
||||||
|
skipped = original_count - len(articles)
|
||||||
|
if skipped > 0:
|
||||||
|
console.print(f"[yellow]Skipping {skipped} already-embedded articles[/yellow]")
|
||||||
|
if not articles:
|
||||||
|
console.print("[green]All articles already embedded![/green]")
|
||||||
|
return
|
||||||
|
|
||||||
|
all_chunks = []
|
||||||
|
all_ids = []
|
||||||
|
all_metadatas = []
|
||||||
|
|
||||||
|
with Progress() as progress:
|
||||||
|
task = progress.add_task("[cyan]Chunking articles...", total=len(articles))
|
||||||
|
|
||||||
|
for article in articles:
|
||||||
|
if not article.plain_text:
|
||||||
|
progress.advance(task)
|
||||||
|
continue
|
||||||
|
|
||||||
|
chunks = self._chunk_text(article.plain_text, article.title)
|
||||||
|
|
||||||
|
for chunk_text, chunk_meta in chunks:
|
||||||
|
chunk_id = f"{article.id}_{chunk_meta['chunk_index']}"
|
||||||
|
|
||||||
|
metadata = {
|
||||||
|
"article_id": article.id,
|
||||||
|
"title": article.title,
|
||||||
|
"categories": ",".join(article.categories[:10]), # Limit categories
|
||||||
|
"timestamp": article.timestamp,
|
||||||
|
"chunk_index": chunk_meta["chunk_index"],
|
||||||
|
"total_chunks": chunk_meta["total_chunks"],
|
||||||
|
}
|
||||||
|
|
||||||
|
all_chunks.append(chunk_text)
|
||||||
|
all_ids.append(chunk_id)
|
||||||
|
all_metadatas.append(metadata)
|
||||||
|
|
||||||
|
progress.advance(task)
|
||||||
|
|
||||||
|
console.print(f"[cyan]Created {len(all_chunks)} chunks from {len(articles)} articles[/cyan]")
|
||||||
|
|
||||||
|
# Generate embeddings and add in batches
|
||||||
|
console.print("[cyan]Generating embeddings and adding to vector store...[/cyan]")
|
||||||
|
|
||||||
|
with Progress() as progress:
|
||||||
|
task = progress.add_task(
|
||||||
|
"[cyan]Embedding and storing...", total=len(all_chunks) // batch_size + 1
|
||||||
|
)
|
||||||
|
|
||||||
|
for i in range(0, len(all_chunks), batch_size):
|
||||||
|
batch_chunks = all_chunks[i : i + batch_size]
|
||||||
|
batch_ids = all_ids[i : i + batch_size]
|
||||||
|
batch_metadatas = all_metadatas[i : i + batch_size]
|
||||||
|
|
||||||
|
# Generate embeddings
|
||||||
|
embeddings = self.model.encode(batch_chunks, show_progress_bar=False)
|
||||||
|
|
||||||
|
# Add to collection
|
||||||
|
self.collection.add(
|
||||||
|
ids=batch_ids,
|
||||||
|
embeddings=embeddings.tolist(),
|
||||||
|
documents=batch_chunks,
|
||||||
|
metadatas=batch_metadatas,
|
||||||
|
)
|
||||||
|
|
||||||
|
progress.advance(task)
|
||||||
|
|
||||||
|
console.print(f"[green]Added {len(all_chunks)} chunks to vector store[/green]")
|
||||||
|
|
||||||
|
def search(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
n_results: int = 5,
|
||||||
|
filter_categories: Optional[list[str]] = None,
|
||||||
|
) -> list[dict]:
|
||||||
|
"""Search for relevant chunks."""
|
||||||
|
query_embedding = self.model.encode([query])[0]
|
||||||
|
|
||||||
|
where_filter = None
|
||||||
|
if filter_categories:
|
||||||
|
# ChromaDB where filter for categories
|
||||||
|
where_filter = {
|
||||||
|
"$or": [{"categories": {"$contains": cat}} for cat in filter_categories]
|
||||||
|
}
|
||||||
|
|
||||||
|
results = self.collection.query(
|
||||||
|
query_embeddings=[query_embedding.tolist()],
|
||||||
|
n_results=n_results,
|
||||||
|
where=where_filter,
|
||||||
|
include=["documents", "metadatas", "distances"],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Format results
|
||||||
|
formatted = []
|
||||||
|
if results["documents"] and results["documents"][0]:
|
||||||
|
for i, doc in enumerate(results["documents"][0]):
|
||||||
|
formatted.append(
|
||||||
|
{
|
||||||
|
"content": doc,
|
||||||
|
"metadata": results["metadatas"][0][i],
|
||||||
|
"distance": results["distances"][0][i],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return formatted
|
||||||
|
|
||||||
|
def get_article_titles(self) -> list[str]:
|
||||||
|
"""Get all unique article titles in the store."""
|
||||||
|
# Get all metadata
|
||||||
|
results = self.collection.get(include=["metadatas"])
|
||||||
|
titles = set()
|
||||||
|
for meta in results["metadatas"]:
|
||||||
|
if meta and "title" in meta:
|
||||||
|
titles.add(meta["title"])
|
||||||
|
return sorted(titles)
|
||||||
|
|
||||||
|
def get_stats(self) -> dict:
|
||||||
|
"""Get statistics about the vector store."""
|
||||||
|
count = self.collection.count()
|
||||||
|
|
||||||
|
# Get sample of metadatas to count unique articles
|
||||||
|
sample = self.collection.get(limit=10000, include=["metadatas"])
|
||||||
|
unique_articles = len(set(m["article_id"] for m in sample["metadatas"] if m))
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total_chunks": count,
|
||||||
|
"unique_articles_sampled": unique_articles,
|
||||||
|
"persist_dir": str(self.persist_dir),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""CLI entry point for generating embeddings."""
|
||||||
|
articles_path = settings.data_dir / "articles.json"
|
||||||
|
|
||||||
|
if not articles_path.exists():
|
||||||
|
console.print(f"[red]Articles file not found: {articles_path}[/red]")
|
||||||
|
console.print("[yellow]Run 'python -m src.parser' first to parse XML dumps[/yellow]")
|
||||||
|
return
|
||||||
|
|
||||||
|
console.print(f"[cyan]Loading articles from {articles_path}...[/cyan]")
|
||||||
|
with open(articles_path, "r", encoding="utf-8") as f:
|
||||||
|
articles_data = json.load(f)
|
||||||
|
|
||||||
|
articles = [WikiArticle(**a) for a in articles_data]
|
||||||
|
console.print(f"[green]Loaded {len(articles)} articles[/green]")
|
||||||
|
|
||||||
|
store = WikiVectorStore()
|
||||||
|
store.add_articles(articles)
|
||||||
|
|
||||||
|
stats = store.get_stats()
|
||||||
|
console.print(f"[green]Vector store stats: {stats}[/green]")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,467 @@
|
||||||
|
"""Article ingress pipeline - scrape, analyze, and draft wiki content."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
import trafilatura
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
from rich.console import Console
|
||||||
|
|
||||||
|
from .config import settings
|
||||||
|
from .embeddings import WikiVectorStore
|
||||||
|
from .llm import llm_client
|
||||||
|
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ScrapedArticle:
|
||||||
|
"""Represents a scraped external article."""
|
||||||
|
|
||||||
|
url: str
|
||||||
|
title: str
|
||||||
|
content: str
|
||||||
|
author: Optional[str] = None
|
||||||
|
date: Optional[str] = None
|
||||||
|
domain: str = ""
|
||||||
|
word_count: int = 0
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
if not self.domain:
|
||||||
|
self.domain = urlparse(self.url).netloc
|
||||||
|
if not self.word_count:
|
||||||
|
self.word_count = len(self.content.split())
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class WikiMatch:
|
||||||
|
"""A matching wiki article for citation."""
|
||||||
|
|
||||||
|
title: str
|
||||||
|
article_id: int
|
||||||
|
relevance_score: float
|
||||||
|
categories: list[str]
|
||||||
|
suggested_citation: str # How to cite the scraped article in this wiki page
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DraftArticle:
|
||||||
|
"""A draft wiki article generated from scraped content."""
|
||||||
|
|
||||||
|
title: str
|
||||||
|
content: str # MediaWiki formatted content
|
||||||
|
categories: list[str]
|
||||||
|
source_url: str
|
||||||
|
source_title: str
|
||||||
|
summary: str
|
||||||
|
related_articles: list[str] # Existing wiki articles to link to
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class IngressResult:
|
||||||
|
"""Result of the ingress pipeline."""
|
||||||
|
|
||||||
|
scraped: ScrapedArticle
|
||||||
|
analysis: dict # Topic analysis results
|
||||||
|
wiki_matches: list[WikiMatch] # Existing articles to update with citations
|
||||||
|
draft_articles: list[DraftArticle] # New articles to create
|
||||||
|
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return {
|
||||||
|
"scraped": asdict(self.scraped),
|
||||||
|
"analysis": self.analysis,
|
||||||
|
"wiki_matches": [asdict(m) for m in self.wiki_matches],
|
||||||
|
"draft_articles": [asdict(d) for d in self.draft_articles],
|
||||||
|
"timestamp": self.timestamp,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class ArticleScraper:
|
||||||
|
"""Scrapes and extracts content from URLs."""
|
||||||
|
|
||||||
|
async def scrape(self, url: str) -> ScrapedArticle:
|
||||||
|
"""Scrape article content from URL."""
|
||||||
|
console.print(f"[cyan]Scraping: {url}[/cyan]")
|
||||||
|
|
||||||
|
async with httpx.AsyncClient(
|
||||||
|
timeout=30.0,
|
||||||
|
follow_redirects=True,
|
||||||
|
headers={
|
||||||
|
"User-Agent": "Mozilla/5.0 (compatible; P2PWikiBot/1.0; +http://p2pfoundation.net)"
|
||||||
|
},
|
||||||
|
) as client:
|
||||||
|
response = await client.get(url)
|
||||||
|
response.raise_for_status()
|
||||||
|
html = response.text
|
||||||
|
|
||||||
|
# Use trafilatura for main content extraction
|
||||||
|
content = trafilatura.extract(
|
||||||
|
html,
|
||||||
|
include_comments=False,
|
||||||
|
include_tables=True,
|
||||||
|
no_fallback=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
if not content:
|
||||||
|
# Fallback to BeautifulSoup
|
||||||
|
soup = BeautifulSoup(html, "html.parser")
|
||||||
|
# Remove script and style elements
|
||||||
|
for element in soup(["script", "style", "nav", "footer", "header"]):
|
||||||
|
element.decompose()
|
||||||
|
content = soup.get_text(separator="\n", strip=True)
|
||||||
|
|
||||||
|
# Extract metadata
|
||||||
|
soup = BeautifulSoup(html, "html.parser")
|
||||||
|
|
||||||
|
title = ""
|
||||||
|
title_tag = soup.find("title")
|
||||||
|
if title_tag:
|
||||||
|
title = title_tag.get_text(strip=True)
|
||||||
|
# Try og:title
|
||||||
|
og_title = soup.find("meta", property="og:title")
|
||||||
|
if og_title and og_title.get("content"):
|
||||||
|
title = og_title["content"]
|
||||||
|
|
||||||
|
author = None
|
||||||
|
author_meta = soup.find("meta", attrs={"name": "author"})
|
||||||
|
if author_meta and author_meta.get("content"):
|
||||||
|
author = author_meta["content"]
|
||||||
|
|
||||||
|
date = None
|
||||||
|
date_meta = soup.find("meta", attrs={"name": "date"}) or soup.find(
|
||||||
|
"meta", property="article:published_time"
|
||||||
|
)
|
||||||
|
if date_meta and date_meta.get("content"):
|
||||||
|
date = date_meta["content"]
|
||||||
|
|
||||||
|
return ScrapedArticle(
|
||||||
|
url=url,
|
||||||
|
title=title,
|
||||||
|
content=content or "",
|
||||||
|
author=author,
|
||||||
|
date=date,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class ContentAnalyzer:
|
||||||
|
"""Analyzes scraped content for wiki relevance."""
|
||||||
|
|
||||||
|
def __init__(self, vector_store: Optional[WikiVectorStore] = None):
|
||||||
|
self.vector_store = vector_store or WikiVectorStore()
|
||||||
|
|
||||||
|
async def analyze(self, article: ScrapedArticle) -> dict:
|
||||||
|
"""Analyze article for topics, concepts, and wiki relevance."""
|
||||||
|
# Truncate very long articles for analysis
|
||||||
|
content_for_analysis = article.content[:8000]
|
||||||
|
|
||||||
|
analysis_prompt = f"""Analyze this article for potential wiki content about peer-to-peer culture, commons, alternative economics, and collaborative governance.
|
||||||
|
|
||||||
|
Article Title: {article.title}
|
||||||
|
Source: {article.domain}
|
||||||
|
|
||||||
|
Article Content:
|
||||||
|
{content_for_analysis}
|
||||||
|
|
||||||
|
Please provide your analysis in the following JSON format:
|
||||||
|
{{
|
||||||
|
"main_topics": ["topic1", "topic2"],
|
||||||
|
"key_concepts": ["concept1", "concept2"],
|
||||||
|
"relevant_categories": ["category1", "category2"],
|
||||||
|
"summary": "2-3 sentence summary",
|
||||||
|
"wiki_relevance_score": 0.0-1.0,
|
||||||
|
"suggested_article_titles": ["Title 1", "Title 2"],
|
||||||
|
"key_quotes": ["notable quote 1", "notable quote 2"],
|
||||||
|
"mentioned_organizations": ["org1", "org2"],
|
||||||
|
"mentioned_people": ["person1", "person2"]
|
||||||
|
}}
|
||||||
|
|
||||||
|
Focus on topics relevant to:
|
||||||
|
- Peer-to-peer networks and culture
|
||||||
|
- Commons-based peer production
|
||||||
|
- Alternative economics and post-capitalism
|
||||||
|
- Cooperative business models
|
||||||
|
- Open source / free culture
|
||||||
|
- Collaborative governance
|
||||||
|
- Sustainability and ecology"""
|
||||||
|
|
||||||
|
response = await llm_client.analyze(
|
||||||
|
content=article.content[:8000],
|
||||||
|
task=analysis_prompt,
|
||||||
|
temperature=0.3,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Parse JSON from response
|
||||||
|
try:
|
||||||
|
# Find JSON in response
|
||||||
|
json_match = re.search(r"\{[\s\S]*\}", response)
|
||||||
|
if json_match:
|
||||||
|
analysis = json.loads(json_match.group())
|
||||||
|
else:
|
||||||
|
analysis = {"error": "Could not parse analysis", "raw": response}
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
analysis = {"error": "Invalid JSON in analysis", "raw": response}
|
||||||
|
|
||||||
|
return analysis
|
||||||
|
|
||||||
|
async def find_wiki_matches(
|
||||||
|
self, article: ScrapedArticle, analysis: dict, n_results: int = 10
|
||||||
|
) -> list[WikiMatch]:
|
||||||
|
"""Find existing wiki articles that could cite this content."""
|
||||||
|
matches = []
|
||||||
|
|
||||||
|
# Search using main topics and concepts
|
||||||
|
search_terms = analysis.get("main_topics", []) + analysis.get("key_concepts", [])
|
||||||
|
|
||||||
|
for term in search_terms[:5]: # Limit searches
|
||||||
|
results = self.vector_store.search(term, n_results=3)
|
||||||
|
|
||||||
|
for result in results:
|
||||||
|
title = result["metadata"].get("title", "Unknown")
|
||||||
|
article_id = result["metadata"].get("article_id", 0)
|
||||||
|
distance = result.get("distance", 1.0)
|
||||||
|
|
||||||
|
# Skip if already added
|
||||||
|
if any(m.title == title for m in matches):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Calculate relevance (lower distance = higher relevance)
|
||||||
|
relevance = max(0, 1 - distance)
|
||||||
|
|
||||||
|
if relevance > 0.3: # Threshold for relevance
|
||||||
|
matches.append(
|
||||||
|
WikiMatch(
|
||||||
|
title=title,
|
||||||
|
article_id=article_id,
|
||||||
|
relevance_score=relevance,
|
||||||
|
categories=result["metadata"]
|
||||||
|
.get("categories", "")
|
||||||
|
.split(","),
|
||||||
|
suggested_citation=f"See also: [{article.title}]({article.url})",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Sort by relevance and limit
|
||||||
|
matches.sort(key=lambda m: m.relevance_score, reverse=True)
|
||||||
|
return matches[:n_results]
|
||||||
|
|
||||||
|
|
||||||
|
class DraftGenerator:
|
||||||
|
"""Generates draft wiki articles from scraped content."""
|
||||||
|
|
||||||
|
def __init__(self, vector_store: Optional[WikiVectorStore] = None):
|
||||||
|
self.vector_store = vector_store or WikiVectorStore()
|
||||||
|
|
||||||
|
async def generate_drafts(
|
||||||
|
self,
|
||||||
|
article: ScrapedArticle,
|
||||||
|
analysis: dict,
|
||||||
|
max_drafts: int = 3,
|
||||||
|
) -> list[DraftArticle]:
|
||||||
|
"""Generate draft wiki articles based on scraped content."""
|
||||||
|
drafts = []
|
||||||
|
|
||||||
|
suggested_titles = analysis.get("suggested_article_titles", [])
|
||||||
|
if not suggested_titles:
|
||||||
|
return drafts
|
||||||
|
|
||||||
|
for title in suggested_titles[:max_drafts]:
|
||||||
|
# Check if article already exists
|
||||||
|
existing = self.vector_store.search(title, n_results=1)
|
||||||
|
if existing and existing[0].get("distance", 1.0) < 0.1:
|
||||||
|
console.print(f"[yellow]Skipping '{title}' - similar article exists[/yellow]")
|
||||||
|
continue
|
||||||
|
|
||||||
|
draft = await self._generate_single_draft(article, analysis, title)
|
||||||
|
if draft:
|
||||||
|
drafts.append(draft)
|
||||||
|
|
||||||
|
return drafts
|
||||||
|
|
||||||
|
async def _generate_single_draft(
|
||||||
|
self,
|
||||||
|
article: ScrapedArticle,
|
||||||
|
analysis: dict,
|
||||||
|
title: str,
|
||||||
|
) -> Optional[DraftArticle]:
|
||||||
|
"""Generate a single draft article."""
|
||||||
|
# Find related existing articles
|
||||||
|
related_search = self.vector_store.search(title, n_results=5)
|
||||||
|
related_titles = [
|
||||||
|
r["metadata"].get("title", "")
|
||||||
|
for r in related_search
|
||||||
|
if r.get("distance", 1.0) < 0.5
|
||||||
|
]
|
||||||
|
|
||||||
|
categories = analysis.get("relevant_categories", [])
|
||||||
|
summary = analysis.get("summary", "")
|
||||||
|
|
||||||
|
draft_prompt = f"""Create a MediaWiki-formatted article for the P2P Foundation Wiki.
|
||||||
|
|
||||||
|
Article Title: {title}
|
||||||
|
|
||||||
|
Source Material:
|
||||||
|
Title: {article.title}
|
||||||
|
URL: {article.url}
|
||||||
|
Summary: {summary}
|
||||||
|
|
||||||
|
Key concepts to cover: {', '.join(analysis.get('key_concepts', []))}
|
||||||
|
|
||||||
|
Related existing wiki articles: {', '.join(related_titles)}
|
||||||
|
|
||||||
|
Categories to include: {', '.join(categories)}
|
||||||
|
|
||||||
|
Please write the wiki article in MediaWiki markup format with:
|
||||||
|
1. An introduction/definition section
|
||||||
|
2. A "Description" section with key information
|
||||||
|
3. Links to related wiki articles using [[Article Name]] format
|
||||||
|
4. A "Sources" section citing the original article
|
||||||
|
5. Category tags at the end using [[Category:Name]] format
|
||||||
|
|
||||||
|
The article should:
|
||||||
|
- Be encyclopedic and neutral in tone
|
||||||
|
- Focus on the P2P/commons aspects of the topic
|
||||||
|
- Be approximately 300-500 words
|
||||||
|
- Include internal wiki links to related concepts"""
|
||||||
|
|
||||||
|
content = await llm_client.generate_draft(
|
||||||
|
draft_prompt,
|
||||||
|
system="You are a wiki editor for the P2P Foundation Wiki. Write clear, encyclopedic articles in MediaWiki markup format.",
|
||||||
|
temperature=0.5,
|
||||||
|
)
|
||||||
|
|
||||||
|
return DraftArticle(
|
||||||
|
title=title,
|
||||||
|
content=content,
|
||||||
|
categories=categories,
|
||||||
|
source_url=article.url,
|
||||||
|
source_title=article.title,
|
||||||
|
summary=summary,
|
||||||
|
related_articles=related_titles,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class IngressPipeline:
|
||||||
|
"""Complete ingress pipeline for processing external articles."""
|
||||||
|
|
||||||
|
def __init__(self, vector_store: Optional[WikiVectorStore] = None):
|
||||||
|
self.vector_store = vector_store or WikiVectorStore()
|
||||||
|
self.scraper = ArticleScraper()
|
||||||
|
self.analyzer = ContentAnalyzer(self.vector_store)
|
||||||
|
self.generator = DraftGenerator(self.vector_store)
|
||||||
|
|
||||||
|
async def process(self, url: str) -> IngressResult:
|
||||||
|
"""Process a URL through the complete ingress pipeline."""
|
||||||
|
console.print(f"[bold cyan]Processing: {url}[/bold cyan]")
|
||||||
|
|
||||||
|
# Step 1: Scrape
|
||||||
|
console.print("[cyan]Step 1/4: Scraping article...[/cyan]")
|
||||||
|
scraped = await self.scraper.scrape(url)
|
||||||
|
console.print(f"[green]Scraped: {scraped.title} ({scraped.word_count} words)[/green]")
|
||||||
|
|
||||||
|
# Step 2: Analyze
|
||||||
|
console.print("[cyan]Step 2/4: Analyzing content...[/cyan]")
|
||||||
|
analysis = await self.analyzer.analyze(scraped)
|
||||||
|
console.print(f"[green]Found {len(analysis.get('main_topics', []))} main topics[/green]")
|
||||||
|
|
||||||
|
# Step 3: Find wiki matches
|
||||||
|
console.print("[cyan]Step 3/4: Finding wiki matches...[/cyan]")
|
||||||
|
matches = await self.analyzer.find_wiki_matches(scraped, analysis)
|
||||||
|
console.print(f"[green]Found {len(matches)} potential wiki matches[/green]")
|
||||||
|
|
||||||
|
# Step 4: Generate drafts
|
||||||
|
console.print("[cyan]Step 4/4: Generating draft articles...[/cyan]")
|
||||||
|
drafts = await self.generator.generate_drafts(scraped, analysis)
|
||||||
|
console.print(f"[green]Generated {len(drafts)} draft articles[/green]")
|
||||||
|
|
||||||
|
result = IngressResult(
|
||||||
|
scraped=scraped,
|
||||||
|
analysis=analysis,
|
||||||
|
wiki_matches=matches,
|
||||||
|
draft_articles=drafts,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Save to review queue
|
||||||
|
self._save_to_review_queue(result)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
def _save_to_review_queue(self, result: IngressResult):
|
||||||
|
"""Save ingress result to the review queue."""
|
||||||
|
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||||
|
domain = result.scraped.domain.replace(".", "_")
|
||||||
|
filename = f"{timestamp}_{domain}.json"
|
||||||
|
filepath = settings.review_queue_dir / filename
|
||||||
|
|
||||||
|
with open(filepath, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(result.to_dict(), f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
|
console.print(f"[green]Saved to review queue: {filepath}[/green]")
|
||||||
|
|
||||||
|
|
||||||
|
def get_review_queue() -> list[dict]:
|
||||||
|
"""Get all items in the review queue."""
|
||||||
|
queue_files = sorted(settings.review_queue_dir.glob("*.json"), reverse=True)
|
||||||
|
|
||||||
|
items = []
|
||||||
|
for filepath in queue_files:
|
||||||
|
with open(filepath, "r", encoding="utf-8") as f:
|
||||||
|
data = json.load(f)
|
||||||
|
data["_filepath"] = str(filepath)
|
||||||
|
items.append(data)
|
||||||
|
|
||||||
|
return items
|
||||||
|
|
||||||
|
|
||||||
|
def approve_item(filepath: str, item_type: str, item_index: int) -> bool:
|
||||||
|
"""
|
||||||
|
Approve an item from the review queue.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
filepath: Path to the review queue JSON file
|
||||||
|
item_type: "match" or "draft"
|
||||||
|
item_index: Index of the item to approve
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
# For now, just mark as approved in the file
|
||||||
|
# In production, this would push to MediaWiki API
|
||||||
|
with open(filepath, "r", encoding="utf-8") as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
if item_type == "match":
|
||||||
|
if item_index < len(data.get("wiki_matches", [])):
|
||||||
|
data["wiki_matches"][item_index]["approved"] = True
|
||||||
|
elif item_type == "draft":
|
||||||
|
if item_index < len(data.get("draft_articles", [])):
|
||||||
|
data["draft_articles"][item_index]["approved"] = True
|
||||||
|
|
||||||
|
with open(filepath, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def reject_item(filepath: str, item_type: str, item_index: int) -> bool:
|
||||||
|
"""Reject an item from the review queue."""
|
||||||
|
with open(filepath, "r", encoding="utf-8") as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
if item_type == "match":
|
||||||
|
if item_index < len(data.get("wiki_matches", [])):
|
||||||
|
data["wiki_matches"][item_index]["rejected"] = True
|
||||||
|
elif item_type == "draft":
|
||||||
|
if item_index < len(data.get("draft_articles", [])):
|
||||||
|
data["draft_articles"][item_index]["rejected"] = True
|
||||||
|
|
||||||
|
with open(filepath, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
@ -0,0 +1,153 @@
|
||||||
|
"""LLM client with hybrid routing between Ollama and Claude."""
|
||||||
|
|
||||||
|
from typing import AsyncIterator, Optional
|
||||||
|
import httpx
|
||||||
|
from anthropic import Anthropic
|
||||||
|
from tenacity import retry, stop_after_attempt, wait_exponential
|
||||||
|
|
||||||
|
from .config import settings
|
||||||
|
|
||||||
|
|
||||||
|
class LLMClient:
|
||||||
|
"""Unified LLM client with hybrid routing."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.ollama_url = settings.ollama_base_url
|
||||||
|
self.ollama_model = settings.ollama_model
|
||||||
|
|
||||||
|
# Initialize Claude client if API key is set
|
||||||
|
self.claude_client = None
|
||||||
|
if settings.anthropic_api_key:
|
||||||
|
self.claude_client = Anthropic(api_key=settings.anthropic_api_key)
|
||||||
|
|
||||||
|
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))
|
||||||
|
async def _call_ollama(
|
||||||
|
self,
|
||||||
|
prompt: str,
|
||||||
|
system: Optional[str] = None,
|
||||||
|
temperature: float = 0.7,
|
||||||
|
max_tokens: int = 2048,
|
||||||
|
) -> str:
|
||||||
|
"""Call Ollama API."""
|
||||||
|
messages = []
|
||||||
|
if system:
|
||||||
|
messages.append({"role": "system", "content": system})
|
||||||
|
messages.append({"role": "user", "content": prompt})
|
||||||
|
|
||||||
|
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||||
|
response = await client.post(
|
||||||
|
f"{self.ollama_url}/api/chat",
|
||||||
|
json={
|
||||||
|
"model": self.ollama_model,
|
||||||
|
"messages": messages,
|
||||||
|
"stream": False,
|
||||||
|
"options": {
|
||||||
|
"temperature": temperature,
|
||||||
|
"num_predict": max_tokens,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
data = response.json()
|
||||||
|
return data["message"]["content"]
|
||||||
|
|
||||||
|
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))
|
||||||
|
async def _call_claude(
|
||||||
|
self,
|
||||||
|
prompt: str,
|
||||||
|
system: Optional[str] = None,
|
||||||
|
temperature: float = 0.7,
|
||||||
|
max_tokens: int = 4096,
|
||||||
|
) -> str:
|
||||||
|
"""Call Claude API."""
|
||||||
|
if not self.claude_client:
|
||||||
|
raise ValueError("Claude API key not configured")
|
||||||
|
|
||||||
|
message = self.claude_client.messages.create(
|
||||||
|
model=settings.claude_model,
|
||||||
|
max_tokens=max_tokens,
|
||||||
|
system=system or "",
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
temperature=temperature,
|
||||||
|
)
|
||||||
|
return message.content[0].text
|
||||||
|
|
||||||
|
async def chat(
|
||||||
|
self,
|
||||||
|
prompt: str,
|
||||||
|
system: Optional[str] = None,
|
||||||
|
use_claude: bool = False,
|
||||||
|
temperature: float = 0.7,
|
||||||
|
max_tokens: int = 2048,
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Chat with LLM using hybrid routing.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
prompt: User prompt
|
||||||
|
system: System prompt
|
||||||
|
use_claude: Force Claude API (otherwise uses Ollama by default)
|
||||||
|
temperature: Sampling temperature
|
||||||
|
max_tokens: Max response tokens
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
LLM response text
|
||||||
|
"""
|
||||||
|
if use_claude and self.claude_client:
|
||||||
|
return await self._call_claude(prompt, system, temperature, max_tokens)
|
||||||
|
else:
|
||||||
|
return await self._call_ollama(prompt, system, temperature, max_tokens)
|
||||||
|
|
||||||
|
async def generate_draft(
|
||||||
|
self,
|
||||||
|
prompt: str,
|
||||||
|
system: Optional[str] = None,
|
||||||
|
temperature: float = 0.5,
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Generate article draft - uses Claude for higher quality.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
prompt: Prompt describing what to generate
|
||||||
|
system: System prompt for context
|
||||||
|
temperature: Lower for more factual output
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Generated draft text
|
||||||
|
"""
|
||||||
|
# Use Claude for drafts if configured, otherwise fall back to Ollama
|
||||||
|
use_claude = settings.use_claude_for_drafts and self.claude_client is not None
|
||||||
|
return await self.chat(
|
||||||
|
prompt, system, use_claude=use_claude, temperature=temperature, max_tokens=4096
|
||||||
|
)
|
||||||
|
|
||||||
|
async def analyze(
|
||||||
|
self,
|
||||||
|
content: str,
|
||||||
|
task: str,
|
||||||
|
temperature: float = 0.3,
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Analyze content for a specific task - uses Claude for complex analysis.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
content: Content to analyze
|
||||||
|
task: Description of analysis task
|
||||||
|
temperature: Lower for more deterministic output
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Analysis result
|
||||||
|
"""
|
||||||
|
prompt = f"""Task: {task}
|
||||||
|
|
||||||
|
Content to analyze:
|
||||||
|
{content}
|
||||||
|
|
||||||
|
Provide your analysis:"""
|
||||||
|
|
||||||
|
use_claude = self.claude_client is not None
|
||||||
|
return await self.chat(prompt, use_claude=use_claude, temperature=temperature)
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton instance
|
||||||
|
llm_client = LLMClient()
|
||||||
|
|
@ -0,0 +1,267 @@
|
||||||
|
"""MediaWiki XML dump parser - converts to structured JSON."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Iterator
|
||||||
|
from lxml import etree
|
||||||
|
from rich.progress import Progress, TaskID
|
||||||
|
from rich.console import Console
|
||||||
|
|
||||||
|
from .config import settings
|
||||||
|
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# MediaWiki namespace
|
||||||
|
MW_NS = {"mw": "http://www.mediawiki.org/xml/export-0.6/"}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class WikiArticle:
|
||||||
|
"""Represents a parsed wiki article."""
|
||||||
|
|
||||||
|
id: int
|
||||||
|
title: str
|
||||||
|
content: str # Raw wikitext
|
||||||
|
plain_text: str # Cleaned plain text for embedding
|
||||||
|
categories: list[str] = field(default_factory=list)
|
||||||
|
links: list[str] = field(default_factory=list) # Internal wiki links
|
||||||
|
external_links: list[str] = field(default_factory=list)
|
||||||
|
timestamp: str = ""
|
||||||
|
contributor: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
def clean_wikitext(text: str) -> str:
|
||||||
|
"""Convert MediaWiki markup to plain text for embedding."""
|
||||||
|
if not text:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
# Remove templates {{...}}
|
||||||
|
text = re.sub(r"\{\{[^}]+\}\}", "", text)
|
||||||
|
|
||||||
|
# Remove categories [[Category:...]]
|
||||||
|
text = re.sub(r"\[\[Category:[^\]]+\]\]", "", text, flags=re.IGNORECASE)
|
||||||
|
|
||||||
|
# Convert wiki links [[Page|Display]] or [[Page]] to just the display text
|
||||||
|
text = re.sub(r"\[\[([^|\]]+)\|([^\]]+)\]\]", r"\2", text)
|
||||||
|
text = re.sub(r"\[\[([^\]]+)\]\]", r"\1", text)
|
||||||
|
|
||||||
|
# Remove external links [url text] -> text
|
||||||
|
text = re.sub(r"\[https?://[^\s\]]+ ([^\]]+)\]", r"\1", text)
|
||||||
|
text = re.sub(r"\[https?://[^\]]+\]", "", text)
|
||||||
|
|
||||||
|
# Remove wiki formatting
|
||||||
|
text = re.sub(r"'''?([^']+)'''?", r"\1", text) # Bold/italic
|
||||||
|
text = re.sub(r"={2,}([^=]+)={2,}", r"\1", text) # Headers
|
||||||
|
text = re.sub(r"^[*#:;]+", "", text, flags=re.MULTILINE) # List markers
|
||||||
|
|
||||||
|
# Remove HTML tags
|
||||||
|
text = re.sub(r"<[^>]+>", "", text)
|
||||||
|
|
||||||
|
# Clean up whitespace
|
||||||
|
text = re.sub(r"\n{3,}", "\n\n", text)
|
||||||
|
text = re.sub(r" {2,}", " ", text)
|
||||||
|
|
||||||
|
return text.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def extract_categories(text: str) -> list[str]:
|
||||||
|
"""Extract category names from wikitext."""
|
||||||
|
pattern = r"\[\[Category:([^\]|]+)"
|
||||||
|
return list(set(re.findall(pattern, text, re.IGNORECASE)))
|
||||||
|
|
||||||
|
|
||||||
|
def extract_wiki_links(text: str) -> list[str]:
|
||||||
|
"""Extract internal wiki links from wikitext."""
|
||||||
|
# Match [[Page]] or [[Page|Display]]
|
||||||
|
pattern = r"\[\[([^|\]]+)"
|
||||||
|
links = re.findall(pattern, text)
|
||||||
|
# Filter out categories and files
|
||||||
|
return list(
|
||||||
|
set(
|
||||||
|
link.strip()
|
||||||
|
for link in links
|
||||||
|
if not link.lower().startswith(("category:", "file:", "image:"))
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_external_links(text: str) -> list[str]:
|
||||||
|
"""Extract external URLs from wikitext."""
|
||||||
|
pattern = r"https?://[^\s\]\)\"']+"
|
||||||
|
return list(set(re.findall(pattern, text)))
|
||||||
|
|
||||||
|
|
||||||
|
def parse_xml_file(xml_path: Path) -> Iterator[WikiArticle]:
|
||||||
|
"""Parse a MediaWiki XML dump file and yield articles."""
|
||||||
|
context = etree.iterparse(
|
||||||
|
str(xml_path), events=("end",), tag="{http://www.mediawiki.org/xml/export-0.6/}page"
|
||||||
|
)
|
||||||
|
|
||||||
|
for event, page in context:
|
||||||
|
# Get basic info
|
||||||
|
title_elem = page.find("mw:title", MW_NS)
|
||||||
|
id_elem = page.find("mw:id", MW_NS)
|
||||||
|
ns_elem = page.find("mw:ns", MW_NS)
|
||||||
|
|
||||||
|
# Skip non-main namespace pages (talk, user, etc.)
|
||||||
|
if ns_elem is not None and ns_elem.text != "0":
|
||||||
|
page.clear()
|
||||||
|
continue
|
||||||
|
|
||||||
|
title = title_elem.text if title_elem is not None else ""
|
||||||
|
page_id = int(id_elem.text) if id_elem is not None else 0
|
||||||
|
|
||||||
|
# Get latest revision
|
||||||
|
revision = page.find("mw:revision", MW_NS)
|
||||||
|
if revision is None:
|
||||||
|
page.clear()
|
||||||
|
continue
|
||||||
|
|
||||||
|
text_elem = revision.find("mw:text", MW_NS)
|
||||||
|
timestamp_elem = revision.find("mw:timestamp", MW_NS)
|
||||||
|
contributor = revision.find("mw:contributor", MW_NS)
|
||||||
|
|
||||||
|
content = text_elem.text if text_elem is not None else ""
|
||||||
|
timestamp = timestamp_elem.text if timestamp_elem is not None else ""
|
||||||
|
|
||||||
|
contributor_name = ""
|
||||||
|
if contributor is not None:
|
||||||
|
username = contributor.find("mw:username", MW_NS)
|
||||||
|
if username is not None:
|
||||||
|
contributor_name = username.text or ""
|
||||||
|
|
||||||
|
# Skip redirects and empty pages
|
||||||
|
if not content or content.lower().startswith("#redirect"):
|
||||||
|
page.clear()
|
||||||
|
continue
|
||||||
|
|
||||||
|
article = WikiArticle(
|
||||||
|
id=page_id,
|
||||||
|
title=title,
|
||||||
|
content=content,
|
||||||
|
plain_text=clean_wikitext(content),
|
||||||
|
categories=extract_categories(content),
|
||||||
|
links=extract_wiki_links(content),
|
||||||
|
external_links=extract_external_links(content),
|
||||||
|
timestamp=timestamp,
|
||||||
|
contributor=contributor_name,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Clear element to free memory
|
||||||
|
page.clear()
|
||||||
|
|
||||||
|
yield article
|
||||||
|
|
||||||
|
|
||||||
|
def parse_all_dumps(output_path: Path | None = None) -> list[WikiArticle]:
|
||||||
|
"""Parse all XML dump files and optionally save to JSON."""
|
||||||
|
xml_files = sorted(settings.xmldump_dir.glob("*.xml"))
|
||||||
|
|
||||||
|
if not xml_files:
|
||||||
|
console.print(f"[red]No XML files found in {settings.xmldump_dir}[/red]")
|
||||||
|
return []
|
||||||
|
|
||||||
|
console.print(f"[green]Found {len(xml_files)} XML files to parse[/green]")
|
||||||
|
|
||||||
|
all_articles = []
|
||||||
|
|
||||||
|
with Progress() as progress:
|
||||||
|
task = progress.add_task("[cyan]Parsing XML files...", total=len(xml_files))
|
||||||
|
|
||||||
|
for xml_file in xml_files:
|
||||||
|
progress.update(task, description=f"[cyan]Parsing {xml_file.name}...")
|
||||||
|
|
||||||
|
for article in parse_xml_file(xml_file):
|
||||||
|
all_articles.append(article)
|
||||||
|
|
||||||
|
progress.advance(task)
|
||||||
|
|
||||||
|
console.print(f"[green]Parsed {len(all_articles)} articles[/green]")
|
||||||
|
|
||||||
|
if output_path:
|
||||||
|
console.print(f"[cyan]Saving to {output_path}...[/cyan]")
|
||||||
|
with open(output_path, "w", encoding="utf-8") as f:
|
||||||
|
json.dump([a.to_dict() for a in all_articles], f, ensure_ascii=False, indent=2)
|
||||||
|
console.print(f"[green]Saved {len(all_articles)} articles to {output_path}[/green]")
|
||||||
|
|
||||||
|
return all_articles
|
||||||
|
|
||||||
|
|
||||||
|
def parse_mediawiki_files(articles_dir: Path, output_path: Path | None = None) -> list[WikiArticle]:
|
||||||
|
"""Parse individual .mediawiki files from a directory (Codeberg format)."""
|
||||||
|
mediawiki_files = list(articles_dir.glob("*.mediawiki"))
|
||||||
|
|
||||||
|
if not mediawiki_files:
|
||||||
|
console.print(f"[red]No .mediawiki files found in {articles_dir}[/red]")
|
||||||
|
return []
|
||||||
|
|
||||||
|
console.print(f"[green]Found {len(mediawiki_files)} .mediawiki files to parse[/green]")
|
||||||
|
|
||||||
|
all_articles = []
|
||||||
|
|
||||||
|
with Progress() as progress:
|
||||||
|
task = progress.add_task("[cyan]Parsing files...", total=len(mediawiki_files))
|
||||||
|
|
||||||
|
for i, filepath in enumerate(mediawiki_files):
|
||||||
|
# Title is the filename without extension
|
||||||
|
title = filepath.stem
|
||||||
|
|
||||||
|
try:
|
||||||
|
content = filepath.read_text(encoding="utf-8", errors="replace")
|
||||||
|
except Exception as e:
|
||||||
|
console.print(f"[yellow]Warning: Could not read {filepath}: {e}[/yellow]")
|
||||||
|
progress.advance(task)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Skip redirects and empty files
|
||||||
|
if not content or content.strip().lower().startswith("#redirect"):
|
||||||
|
progress.advance(task)
|
||||||
|
continue
|
||||||
|
|
||||||
|
article = WikiArticle(
|
||||||
|
id=i,
|
||||||
|
title=title,
|
||||||
|
content=content,
|
||||||
|
plain_text=clean_wikitext(content),
|
||||||
|
categories=extract_categories(content),
|
||||||
|
links=extract_wiki_links(content),
|
||||||
|
external_links=extract_external_links(content),
|
||||||
|
timestamp="",
|
||||||
|
contributor="",
|
||||||
|
)
|
||||||
|
|
||||||
|
all_articles.append(article)
|
||||||
|
progress.advance(task)
|
||||||
|
|
||||||
|
console.print(f"[green]Parsed {len(all_articles)} articles[/green]")
|
||||||
|
|
||||||
|
if output_path:
|
||||||
|
console.print(f"[cyan]Saving to {output_path}...[/cyan]")
|
||||||
|
with open(output_path, "w", encoding="utf-8") as f:
|
||||||
|
json.dump([a.to_dict() for a in all_articles], f, ensure_ascii=False, indent=2)
|
||||||
|
console.print(f"[green]Saved {len(all_articles)} articles to {output_path}[/green]")
|
||||||
|
|
||||||
|
return all_articles
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""CLI entry point for parsing wiki content."""
|
||||||
|
output_path = settings.data_dir / "articles.json"
|
||||||
|
|
||||||
|
# Check for Codeberg-style articles directory first (newer, more complete)
|
||||||
|
articles_dir = settings.project_root / "articles" / "articles"
|
||||||
|
if articles_dir.exists():
|
||||||
|
console.print("[cyan]Found Codeberg-style articles directory, using that...[/cyan]")
|
||||||
|
parse_mediawiki_files(articles_dir, output_path)
|
||||||
|
else:
|
||||||
|
# Fall back to XML dumps
|
||||||
|
parse_all_dumps(output_path)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,159 @@
|
||||||
|
"""RAG (Retrieval Augmented Generation) system for wiki Q&A."""
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .embeddings import WikiVectorStore
|
||||||
|
from .llm import llm_client
|
||||||
|
|
||||||
|
|
||||||
|
SYSTEM_PROMPT = """You are a knowledgeable assistant for the P2P Foundation Wiki, a comprehensive knowledge base about peer-to-peer culture, commons-based peer production, alternative economics, and collaborative governance.
|
||||||
|
|
||||||
|
Your role is to answer questions about the wiki content accurately and helpfully. When answering:
|
||||||
|
|
||||||
|
1. Base your answers on the provided wiki content excerpts
|
||||||
|
2. Cite specific articles when relevant (use the article titles)
|
||||||
|
3. If the provided content doesn't fully answer the question, say so
|
||||||
|
4. Explain concepts in accessible language while maintaining accuracy
|
||||||
|
5. Connect related concepts when helpful
|
||||||
|
|
||||||
|
If asked about something not covered in the provided content, acknowledge this and suggest related topics that might be helpful."""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ChatMessage:
|
||||||
|
"""A chat message."""
|
||||||
|
|
||||||
|
role: str # "user" or "assistant"
|
||||||
|
content: str
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RAGResponse:
|
||||||
|
"""Response from the RAG system."""
|
||||||
|
|
||||||
|
answer: str
|
||||||
|
sources: list[dict] # List of source articles used
|
||||||
|
query: str
|
||||||
|
|
||||||
|
|
||||||
|
class WikiRAG:
|
||||||
|
"""RAG system for answering questions about wiki content."""
|
||||||
|
|
||||||
|
def __init__(self, vector_store: Optional[WikiVectorStore] = None):
|
||||||
|
self.vector_store = vector_store or WikiVectorStore()
|
||||||
|
self.conversation_history: list[ChatMessage] = []
|
||||||
|
|
||||||
|
def _format_context(self, search_results: list[dict]) -> str:
|
||||||
|
"""Format search results as context for the LLM."""
|
||||||
|
if not search_results:
|
||||||
|
return "No relevant wiki content found for this query."
|
||||||
|
|
||||||
|
context_parts = []
|
||||||
|
for i, result in enumerate(search_results, 1):
|
||||||
|
title = result["metadata"].get("title", "Unknown")
|
||||||
|
content = result["content"]
|
||||||
|
categories = result["metadata"].get("categories", "")
|
||||||
|
|
||||||
|
context_parts.append(
|
||||||
|
f"[Source {i}: {title}]\n"
|
||||||
|
f"Categories: {categories}\n"
|
||||||
|
f"Content:\n{content}\n"
|
||||||
|
)
|
||||||
|
|
||||||
|
return "\n---\n".join(context_parts)
|
||||||
|
|
||||||
|
def _build_prompt(self, query: str, context: str) -> str:
|
||||||
|
"""Build the prompt for the LLM."""
|
||||||
|
# Include recent conversation history for context
|
||||||
|
history_text = ""
|
||||||
|
if self.conversation_history:
|
||||||
|
recent = self.conversation_history[-4:] # Last 2 exchanges
|
||||||
|
history_text = "\n\nRecent conversation:\n"
|
||||||
|
for msg in recent:
|
||||||
|
role = "User" if msg.role == "user" else "Assistant"
|
||||||
|
# Truncate long messages
|
||||||
|
content = msg.content[:500] + "..." if len(msg.content) > 500 else msg.content
|
||||||
|
history_text += f"{role}: {content}\n"
|
||||||
|
|
||||||
|
return f"""Based on the following wiki content, please answer the user's question.
|
||||||
|
|
||||||
|
Wiki Content:
|
||||||
|
{context}
|
||||||
|
{history_text}
|
||||||
|
User Question: {query}
|
||||||
|
|
||||||
|
Please provide a helpful answer based on the wiki content above. Cite specific articles when relevant."""
|
||||||
|
|
||||||
|
async def ask(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
n_results: int = 5,
|
||||||
|
filter_categories: Optional[list[str]] = None,
|
||||||
|
) -> RAGResponse:
|
||||||
|
"""
|
||||||
|
Ask a question and get an answer based on wiki content.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: User's question
|
||||||
|
n_results: Number of relevant chunks to retrieve
|
||||||
|
filter_categories: Optional category filter
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
RAGResponse with answer and sources
|
||||||
|
"""
|
||||||
|
# Search for relevant content
|
||||||
|
search_results = self.vector_store.search(
|
||||||
|
query, n_results=n_results, filter_categories=filter_categories
|
||||||
|
)
|
||||||
|
|
||||||
|
# Format context
|
||||||
|
context = self._format_context(search_results)
|
||||||
|
|
||||||
|
# Build prompt
|
||||||
|
prompt = self._build_prompt(query, context)
|
||||||
|
|
||||||
|
# Get LLM response (use Ollama for chat by default)
|
||||||
|
answer = await llm_client.chat(
|
||||||
|
prompt,
|
||||||
|
system=SYSTEM_PROMPT,
|
||||||
|
use_claude=False, # Use Ollama for chat
|
||||||
|
temperature=0.7,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Update conversation history
|
||||||
|
self.conversation_history.append(ChatMessage(role="user", content=query))
|
||||||
|
self.conversation_history.append(ChatMessage(role="assistant", content=answer))
|
||||||
|
|
||||||
|
# Extract unique sources
|
||||||
|
sources = []
|
||||||
|
seen_titles = set()
|
||||||
|
for result in search_results:
|
||||||
|
title = result["metadata"].get("title", "Unknown")
|
||||||
|
if title not in seen_titles:
|
||||||
|
seen_titles.add(title)
|
||||||
|
sources.append(
|
||||||
|
{
|
||||||
|
"title": title,
|
||||||
|
"article_id": result["metadata"].get("article_id"),
|
||||||
|
"categories": result["metadata"].get("categories", "").split(","),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return RAGResponse(answer=answer, sources=sources, query=query)
|
||||||
|
|
||||||
|
def clear_history(self):
|
||||||
|
"""Clear conversation history."""
|
||||||
|
self.conversation_history = []
|
||||||
|
|
||||||
|
def get_suggestions(self, partial_query: str, n_results: int = 5) -> list[str]:
|
||||||
|
"""Get article title suggestions for autocomplete."""
|
||||||
|
# Simple prefix matching on titles
|
||||||
|
all_titles = self.vector_store.get_article_titles()
|
||||||
|
partial_lower = partial_query.lower()
|
||||||
|
|
||||||
|
suggestions = [
|
||||||
|
title for title in all_titles if partial_lower in title.lower()
|
||||||
|
][:n_results]
|
||||||
|
|
||||||
|
return suggestions
|
||||||
|
|
@ -0,0 +1,707 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
<title>P2P Wiki AI</title>
|
||||||
|
<style>
|
||||||
|
:root {
|
||||||
|
--bg-primary: #1a1a2e;
|
||||||
|
--bg-secondary: #16213e;
|
||||||
|
--bg-tertiary: #0f3460;
|
||||||
|
--text-primary: #e8e8e8;
|
||||||
|
--text-secondary: #a0a0a0;
|
||||||
|
--accent: #e94560;
|
||||||
|
--accent-hover: #ff6b6b;
|
||||||
|
--success: #4ecdc4;
|
||||||
|
--border: #2a2a4a;
|
||||||
|
}
|
||||||
|
|
||||||
|
* {
|
||||||
|
box-sizing: border-box;
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
|
||||||
|
background: var(--bg-primary);
|
||||||
|
color: var(--text-primary);
|
||||||
|
min-height: 100vh;
|
||||||
|
}
|
||||||
|
|
||||||
|
.container {
|
||||||
|
max-width: 1200px;
|
||||||
|
margin: 0 auto;
|
||||||
|
padding: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
header {
|
||||||
|
display: flex;
|
||||||
|
justify-content: space-between;
|
||||||
|
align-items: center;
|
||||||
|
padding: 20px 0;
|
||||||
|
border-bottom: 1px solid var(--border);
|
||||||
|
margin-bottom: 30px;
|
||||||
|
}
|
||||||
|
|
||||||
|
h1 {
|
||||||
|
font-size: 1.8em;
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
|
||||||
|
h1 span {
|
||||||
|
color: var(--accent);
|
||||||
|
}
|
||||||
|
|
||||||
|
.tabs {
|
||||||
|
display: flex;
|
||||||
|
gap: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.tab {
|
||||||
|
padding: 10px 20px;
|
||||||
|
background: var(--bg-secondary);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 8px;
|
||||||
|
cursor: pointer;
|
||||||
|
transition: all 0.2s;
|
||||||
|
}
|
||||||
|
|
||||||
|
.tab:hover, .tab.active {
|
||||||
|
background: var(--bg-tertiary);
|
||||||
|
border-color: var(--accent);
|
||||||
|
}
|
||||||
|
|
||||||
|
.panel {
|
||||||
|
display: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
.panel.active {
|
||||||
|
display: block;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Chat Panel */
|
||||||
|
.chat-container {
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
height: calc(100vh - 200px);
|
||||||
|
background: var(--bg-secondary);
|
||||||
|
border-radius: 12px;
|
||||||
|
overflow: hidden;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chat-messages {
|
||||||
|
flex: 1;
|
||||||
|
overflow-y: auto;
|
||||||
|
padding: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.message {
|
||||||
|
margin-bottom: 20px;
|
||||||
|
max-width: 80%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.message.user {
|
||||||
|
margin-left: auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
.message-content {
|
||||||
|
padding: 15px;
|
||||||
|
border-radius: 12px;
|
||||||
|
line-height: 1.6;
|
||||||
|
}
|
||||||
|
|
||||||
|
.message.user .message-content {
|
||||||
|
background: var(--bg-tertiary);
|
||||||
|
}
|
||||||
|
|
||||||
|
.message.assistant .message-content {
|
||||||
|
background: var(--bg-primary);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
}
|
||||||
|
|
||||||
|
.message-sources {
|
||||||
|
margin-top: 10px;
|
||||||
|
padding: 10px;
|
||||||
|
background: rgba(233, 69, 96, 0.1);
|
||||||
|
border-radius: 8px;
|
||||||
|
font-size: 0.9em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.message-sources h4 {
|
||||||
|
color: var(--accent);
|
||||||
|
margin-bottom: 5px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.source-tag {
|
||||||
|
display: inline-block;
|
||||||
|
padding: 3px 8px;
|
||||||
|
margin: 2px;
|
||||||
|
background: var(--bg-tertiary);
|
||||||
|
border-radius: 4px;
|
||||||
|
font-size: 0.85em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chat-input {
|
||||||
|
display: flex;
|
||||||
|
gap: 10px;
|
||||||
|
padding: 20px;
|
||||||
|
background: var(--bg-primary);
|
||||||
|
border-top: 1px solid var(--border);
|
||||||
|
}
|
||||||
|
|
||||||
|
.chat-input input {
|
||||||
|
flex: 1;
|
||||||
|
padding: 15px;
|
||||||
|
background: var(--bg-secondary);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 8px;
|
||||||
|
color: var(--text-primary);
|
||||||
|
font-size: 1em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chat-input input:focus {
|
||||||
|
outline: none;
|
||||||
|
border-color: var(--accent);
|
||||||
|
}
|
||||||
|
|
||||||
|
.chat-input button {
|
||||||
|
padding: 15px 30px;
|
||||||
|
background: var(--accent);
|
||||||
|
border: none;
|
||||||
|
border-radius: 8px;
|
||||||
|
color: white;
|
||||||
|
font-weight: 600;
|
||||||
|
cursor: pointer;
|
||||||
|
transition: background 0.2s;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chat-input button:hover {
|
||||||
|
background: var(--accent-hover);
|
||||||
|
}
|
||||||
|
|
||||||
|
.chat-input button:disabled {
|
||||||
|
opacity: 0.5;
|
||||||
|
cursor: not-allowed;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Ingress Panel */
|
||||||
|
.ingress-container {
|
||||||
|
background: var(--bg-secondary);
|
||||||
|
border-radius: 12px;
|
||||||
|
padding: 30px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.ingress-form {
|
||||||
|
display: flex;
|
||||||
|
gap: 10px;
|
||||||
|
margin-bottom: 30px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.ingress-form input {
|
||||||
|
flex: 1;
|
||||||
|
padding: 15px;
|
||||||
|
background: var(--bg-primary);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 8px;
|
||||||
|
color: var(--text-primary);
|
||||||
|
font-size: 1em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.ingress-form input:focus {
|
||||||
|
outline: none;
|
||||||
|
border-color: var(--accent);
|
||||||
|
}
|
||||||
|
|
||||||
|
.ingress-form button {
|
||||||
|
padding: 15px 30px;
|
||||||
|
background: var(--success);
|
||||||
|
border: none;
|
||||||
|
border-radius: 8px;
|
||||||
|
color: var(--bg-primary);
|
||||||
|
font-weight: 600;
|
||||||
|
cursor: pointer;
|
||||||
|
transition: opacity 0.2s;
|
||||||
|
}
|
||||||
|
|
||||||
|
.ingress-form button:hover {
|
||||||
|
opacity: 0.9;
|
||||||
|
}
|
||||||
|
|
||||||
|
.ingress-form button:disabled {
|
||||||
|
opacity: 0.5;
|
||||||
|
cursor: not-allowed;
|
||||||
|
}
|
||||||
|
|
||||||
|
.ingress-result {
|
||||||
|
background: var(--bg-primary);
|
||||||
|
border-radius: 8px;
|
||||||
|
padding: 20px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.ingress-result h3 {
|
||||||
|
margin-bottom: 15px;
|
||||||
|
color: var(--accent);
|
||||||
|
}
|
||||||
|
|
||||||
|
.result-stats {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
|
||||||
|
gap: 15px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.stat {
|
||||||
|
background: var(--bg-secondary);
|
||||||
|
padding: 15px;
|
||||||
|
border-radius: 8px;
|
||||||
|
text-align: center;
|
||||||
|
}
|
||||||
|
|
||||||
|
.stat-value {
|
||||||
|
font-size: 2em;
|
||||||
|
font-weight: bold;
|
||||||
|
color: var(--success);
|
||||||
|
}
|
||||||
|
|
||||||
|
.stat-label {
|
||||||
|
color: var(--text-secondary);
|
||||||
|
font-size: 0.9em;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Review Panel */
|
||||||
|
.review-container {
|
||||||
|
background: var(--bg-secondary);
|
||||||
|
border-radius: 12px;
|
||||||
|
padding: 30px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.review-item {
|
||||||
|
background: var(--bg-primary);
|
||||||
|
border-radius: 8px;
|
||||||
|
padding: 20px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.review-item h3 {
|
||||||
|
margin-bottom: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.review-meta {
|
||||||
|
color: var(--text-secondary);
|
||||||
|
font-size: 0.9em;
|
||||||
|
margin-bottom: 15px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.review-section {
|
||||||
|
margin-top: 20px;
|
||||||
|
padding-top: 20px;
|
||||||
|
border-top: 1px solid var(--border);
|
||||||
|
}
|
||||||
|
|
||||||
|
.review-section h4 {
|
||||||
|
margin-bottom: 10px;
|
||||||
|
color: var(--accent);
|
||||||
|
}
|
||||||
|
|
||||||
|
.match-item, .draft-item {
|
||||||
|
background: var(--bg-secondary);
|
||||||
|
padding: 15px;
|
||||||
|
border-radius: 8px;
|
||||||
|
margin-bottom: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.match-item .title, .draft-item .title {
|
||||||
|
font-weight: 600;
|
||||||
|
margin-bottom: 5px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.match-item .score {
|
||||||
|
color: var(--success);
|
||||||
|
}
|
||||||
|
|
||||||
|
.action-buttons {
|
||||||
|
display: flex;
|
||||||
|
gap: 10px;
|
||||||
|
margin-top: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.btn-approve {
|
||||||
|
padding: 8px 16px;
|
||||||
|
background: var(--success);
|
||||||
|
border: none;
|
||||||
|
border-radius: 4px;
|
||||||
|
color: var(--bg-primary);
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
|
||||||
|
.btn-reject {
|
||||||
|
padding: 8px 16px;
|
||||||
|
background: var(--accent);
|
||||||
|
border: none;
|
||||||
|
border-radius: 4px;
|
||||||
|
color: white;
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
|
||||||
|
.loading {
|
||||||
|
display: inline-block;
|
||||||
|
width: 20px;
|
||||||
|
height: 20px;
|
||||||
|
border: 2px solid var(--text-secondary);
|
||||||
|
border-top-color: var(--accent);
|
||||||
|
border-radius: 50%;
|
||||||
|
animation: spin 1s linear infinite;
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes spin {
|
||||||
|
to { transform: rotate(360deg); }
|
||||||
|
}
|
||||||
|
|
||||||
|
.empty-state {
|
||||||
|
text-align: center;
|
||||||
|
padding: 50px;
|
||||||
|
color: var(--text-secondary);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Markdown-like formatting */
|
||||||
|
.message-content p { margin-bottom: 10px; }
|
||||||
|
.message-content ul, .message-content ol { margin-left: 20px; margin-bottom: 10px; }
|
||||||
|
.message-content code { background: var(--bg-tertiary); padding: 2px 6px; border-radius: 4px; }
|
||||||
|
.message-content pre { background: var(--bg-tertiary); padding: 15px; border-radius: 8px; overflow-x: auto; }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="container">
|
||||||
|
<header>
|
||||||
|
<h1>P2P Wiki <span>AI</span></h1>
|
||||||
|
<div class="tabs">
|
||||||
|
<div class="tab active" data-panel="chat">Chat</div>
|
||||||
|
<div class="tab" data-panel="ingress">Ingress</div>
|
||||||
|
<div class="tab" data-panel="review">Review Queue</div>
|
||||||
|
</div>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<!-- Chat Panel -->
|
||||||
|
<div id="chat" class="panel active">
|
||||||
|
<div class="chat-container">
|
||||||
|
<div class="chat-messages" id="chatMessages">
|
||||||
|
<div class="message assistant">
|
||||||
|
<div class="message-content">
|
||||||
|
<p>Welcome to the P2P Wiki AI assistant! I can help you explore the P2P Foundation Wiki's knowledge about peer-to-peer culture, commons-based peer production, alternative economics, and collaborative governance.</p>
|
||||||
|
<p>Ask me anything about these topics!</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="chat-input">
|
||||||
|
<input type="text" id="chatInput" placeholder="Ask about P2P, commons, cooperative economics..." />
|
||||||
|
<button id="chatSend">Send</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Ingress Panel -->
|
||||||
|
<div id="ingress" class="panel">
|
||||||
|
<div class="ingress-container">
|
||||||
|
<h2>Article Ingress</h2>
|
||||||
|
<p style="color: var(--text-secondary); margin-bottom: 20px;">
|
||||||
|
Drop an article URL to analyze it for wiki content. The AI will identify relevant topics,
|
||||||
|
find matching wiki articles for citations, and draft new articles.
|
||||||
|
</p>
|
||||||
|
<div class="ingress-form">
|
||||||
|
<input type="url" id="ingressUrl" placeholder="https://example.com/article-about-commons" />
|
||||||
|
<button id="ingressSubmit">Process Article</button>
|
||||||
|
</div>
|
||||||
|
<div id="ingressResult"></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Review Panel -->
|
||||||
|
<div id="review" class="panel">
|
||||||
|
<div class="review-container">
|
||||||
|
<h2>Review Queue</h2>
|
||||||
|
<p style="color: var(--text-secondary); margin-bottom: 20px;">
|
||||||
|
Review and approve AI-generated wiki content before it's added to the wiki.
|
||||||
|
</p>
|
||||||
|
<div id="reviewItems">
|
||||||
|
<div class="empty-state">Loading review items...</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
const API_BASE = ''; // Same origin
|
||||||
|
|
||||||
|
// Tab switching
|
||||||
|
document.querySelectorAll('.tab').forEach(tab => {
|
||||||
|
tab.addEventListener('click', () => {
|
||||||
|
document.querySelectorAll('.tab').forEach(t => t.classList.remove('active'));
|
||||||
|
document.querySelectorAll('.panel').forEach(p => p.classList.remove('active'));
|
||||||
|
tab.classList.add('active');
|
||||||
|
document.getElementById(tab.dataset.panel).classList.add('active');
|
||||||
|
|
||||||
|
// Load review items when switching to review tab
|
||||||
|
if (tab.dataset.panel === 'review') {
|
||||||
|
loadReviewItems();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// Chat functionality
|
||||||
|
const chatMessages = document.getElementById('chatMessages');
|
||||||
|
const chatInput = document.getElementById('chatInput');
|
||||||
|
const chatSend = document.getElementById('chatSend');
|
||||||
|
|
||||||
|
function addMessage(content, role, sources = []) {
|
||||||
|
const div = document.createElement('div');
|
||||||
|
div.className = `message ${role}`;
|
||||||
|
|
||||||
|
let html = `<div class="message-content">${formatMessage(content)}</div>`;
|
||||||
|
|
||||||
|
if (sources.length > 0) {
|
||||||
|
html += `<div class="message-sources">
|
||||||
|
<h4>Sources</h4>
|
||||||
|
${sources.map(s => `<span class="source-tag">${s.title}</span>`).join('')}
|
||||||
|
</div>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
div.innerHTML = html;
|
||||||
|
chatMessages.appendChild(div);
|
||||||
|
chatMessages.scrollTop = chatMessages.scrollHeight;
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatMessage(text) {
|
||||||
|
// Basic markdown-like formatting
|
||||||
|
return text
|
||||||
|
.replace(/\n\n/g, '</p><p>')
|
||||||
|
.replace(/\n/g, '<br>')
|
||||||
|
.replace(/\*\*(.+?)\*\*/g, '<strong>$1</strong>')
|
||||||
|
.replace(/\*(.+?)\*/g, '<em>$1</em>')
|
||||||
|
.replace(/`(.+?)`/g, '<code>$1</code>');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function sendChat() {
|
||||||
|
const query = chatInput.value.trim();
|
||||||
|
if (!query) return;
|
||||||
|
|
||||||
|
chatInput.value = '';
|
||||||
|
chatSend.disabled = true;
|
||||||
|
|
||||||
|
addMessage(query, 'user');
|
||||||
|
|
||||||
|
// Show loading
|
||||||
|
const loadingDiv = document.createElement('div');
|
||||||
|
loadingDiv.className = 'message assistant';
|
||||||
|
loadingDiv.innerHTML = '<div class="message-content"><span class="loading"></span> Thinking...</div>';
|
||||||
|
chatMessages.appendChild(loadingDiv);
|
||||||
|
chatMessages.scrollTop = chatMessages.scrollHeight;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const response = await fetch(`${API_BASE}/chat`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({ query, n_results: 5 })
|
||||||
|
});
|
||||||
|
|
||||||
|
const data = await response.json();
|
||||||
|
|
||||||
|
chatMessages.removeChild(loadingDiv);
|
||||||
|
|
||||||
|
if (response.ok) {
|
||||||
|
addMessage(data.answer, 'assistant', data.sources);
|
||||||
|
} else {
|
||||||
|
addMessage(`Error: ${data.detail || 'Something went wrong'}`, 'assistant');
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
chatMessages.removeChild(loadingDiv);
|
||||||
|
addMessage(`Error: ${error.message}`, 'assistant');
|
||||||
|
}
|
||||||
|
|
||||||
|
chatSend.disabled = false;
|
||||||
|
chatInput.focus();
|
||||||
|
}
|
||||||
|
|
||||||
|
chatSend.addEventListener('click', sendChat);
|
||||||
|
chatInput.addEventListener('keypress', (e) => {
|
||||||
|
if (e.key === 'Enter') sendChat();
|
||||||
|
});
|
||||||
|
|
||||||
|
// Ingress functionality
|
||||||
|
const ingressUrl = document.getElementById('ingressUrl');
|
||||||
|
const ingressSubmit = document.getElementById('ingressSubmit');
|
||||||
|
const ingressResult = document.getElementById('ingressResult');
|
||||||
|
|
||||||
|
async function processIngress() {
|
||||||
|
const url = ingressUrl.value.trim();
|
||||||
|
if (!url) return;
|
||||||
|
|
||||||
|
ingressSubmit.disabled = true;
|
||||||
|
ingressSubmit.textContent = 'Processing...';
|
||||||
|
|
||||||
|
ingressResult.innerHTML = `
|
||||||
|
<div class="ingress-result">
|
||||||
|
<h3>Processing Article</h3>
|
||||||
|
<p><span class="loading"></span> Scraping and analyzing content...</p>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const response = await fetch(`${API_BASE}/ingress`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({ url })
|
||||||
|
});
|
||||||
|
|
||||||
|
const data = await response.json();
|
||||||
|
|
||||||
|
if (response.ok) {
|
||||||
|
ingressResult.innerHTML = `
|
||||||
|
<div class="ingress-result">
|
||||||
|
<h3>Analysis Complete: ${data.scraped_title || 'Article'}</h3>
|
||||||
|
<div class="result-stats">
|
||||||
|
<div class="stat">
|
||||||
|
<div class="stat-value">${data.topics_found}</div>
|
||||||
|
<div class="stat-label">Topics Found</div>
|
||||||
|
</div>
|
||||||
|
<div class="stat">
|
||||||
|
<div class="stat-value">${data.wiki_matches}</div>
|
||||||
|
<div class="stat-label">Wiki Matches</div>
|
||||||
|
</div>
|
||||||
|
<div class="stat">
|
||||||
|
<div class="stat-value">${data.drafts_generated}</div>
|
||||||
|
<div class="stat-label">Drafts Generated</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<p style="color: var(--success);">
|
||||||
|
Results added to review queue. Check the Review tab to approve or reject suggestions.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
} else {
|
||||||
|
ingressResult.innerHTML = `
|
||||||
|
<div class="ingress-result">
|
||||||
|
<h3 style="color: var(--accent);">Error</h3>
|
||||||
|
<p>${data.detail || 'Failed to process article'}</p>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
ingressResult.innerHTML = `
|
||||||
|
<div class="ingress-result">
|
||||||
|
<h3 style="color: var(--accent);">Error</h3>
|
||||||
|
<p>${error.message}</p>
|
||||||
|
</div>
|
||||||
|
`;
|
||||||
|
}
|
||||||
|
|
||||||
|
ingressSubmit.disabled = false;
|
||||||
|
ingressSubmit.textContent = 'Process Article';
|
||||||
|
}
|
||||||
|
|
||||||
|
ingressSubmit.addEventListener('click', processIngress);
|
||||||
|
ingressUrl.addEventListener('keypress', (e) => {
|
||||||
|
if (e.key === 'Enter') processIngress();
|
||||||
|
});
|
||||||
|
|
||||||
|
// Review functionality
|
||||||
|
const reviewItems = document.getElementById('reviewItems');
|
||||||
|
|
||||||
|
async function loadReviewItems() {
|
||||||
|
try {
|
||||||
|
const response = await fetch(`${API_BASE}/review`);
|
||||||
|
const data = await response.json();
|
||||||
|
|
||||||
|
if (data.count === 0) {
|
||||||
|
reviewItems.innerHTML = '<div class="empty-state">No items in the review queue.</div>';
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
reviewItems.innerHTML = data.items.map(item => `
|
||||||
|
<div class="review-item">
|
||||||
|
<h3>${item.scraped?.title || 'Unknown Article'}</h3>
|
||||||
|
<div class="review-meta">
|
||||||
|
Source: <a href="${item.scraped?.url}" target="_blank">${item.scraped?.domain}</a>
|
||||||
|
| Processed: ${new Date(item.timestamp).toLocaleString()}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
${item.wiki_matches?.length > 0 ? `
|
||||||
|
<div class="review-section">
|
||||||
|
<h4>Suggested Citations (${item.wiki_matches.length})</h4>
|
||||||
|
${item.wiki_matches.map((match, i) => `
|
||||||
|
<div class="match-item" ${match.approved ? 'style="opacity: 0.5"' : ''}>
|
||||||
|
<div class="title">${match.title}</div>
|
||||||
|
<div class="score">Relevance: ${(match.relevance_score * 100).toFixed(0)}%</div>
|
||||||
|
<div>${match.suggested_citation}</div>
|
||||||
|
${!match.approved && !match.rejected ? `
|
||||||
|
<div class="action-buttons">
|
||||||
|
<button class="btn-approve" onclick="reviewAction('${item._filepath}', 'match', ${i}, 'approve')">Approve</button>
|
||||||
|
<button class="btn-reject" onclick="reviewAction('${item._filepath}', 'match', ${i}, 'reject')">Reject</button>
|
||||||
|
</div>
|
||||||
|
` : `<em>${match.approved ? 'Approved' : 'Rejected'}</em>`}
|
||||||
|
</div>
|
||||||
|
`).join('')}
|
||||||
|
</div>
|
||||||
|
` : ''}
|
||||||
|
|
||||||
|
${item.draft_articles?.length > 0 ? `
|
||||||
|
<div class="review-section">
|
||||||
|
<h4>Draft Articles (${item.draft_articles.length})</h4>
|
||||||
|
${item.draft_articles.map((draft, i) => `
|
||||||
|
<div class="draft-item" ${draft.approved ? 'style="opacity: 0.5"' : ''}>
|
||||||
|
<div class="title">${draft.title}</div>
|
||||||
|
<div style="color: var(--text-secondary); font-size: 0.9em; margin-bottom: 10px;">
|
||||||
|
${draft.summary || ''}
|
||||||
|
</div>
|
||||||
|
<details>
|
||||||
|
<summary style="cursor: pointer; color: var(--accent);">View Draft Content</summary>
|
||||||
|
<pre style="margin-top: 10px; white-space: pre-wrap; font-size: 0.85em;">${draft.content}</pre>
|
||||||
|
</details>
|
||||||
|
${!draft.approved && !draft.rejected ? `
|
||||||
|
<div class="action-buttons">
|
||||||
|
<button class="btn-approve" onclick="reviewAction('${item._filepath}', 'draft', ${i}, 'approve')">Approve</button>
|
||||||
|
<button class="btn-reject" onclick="reviewAction('${item._filepath}', 'draft', ${i}, 'reject')">Reject</button>
|
||||||
|
</div>
|
||||||
|
` : `<em>${draft.approved ? 'Approved' : 'Rejected'}</em>`}
|
||||||
|
</div>
|
||||||
|
`).join('')}
|
||||||
|
</div>
|
||||||
|
` : ''}
|
||||||
|
</div>
|
||||||
|
`).join('');
|
||||||
|
} catch (error) {
|
||||||
|
reviewItems.innerHTML = `<div class="empty-state">Error loading review items: ${error.message}</div>`;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function reviewAction(filepath, itemType, itemIndex, action) {
|
||||||
|
try {
|
||||||
|
const response = await fetch(`${API_BASE}/review/action`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
filepath,
|
||||||
|
item_type: itemType,
|
||||||
|
item_index: itemIndex,
|
||||||
|
action
|
||||||
|
})
|
||||||
|
});
|
||||||
|
|
||||||
|
if (response.ok) {
|
||||||
|
loadReviewItems(); // Refresh the list
|
||||||
|
} else {
|
||||||
|
const data = await response.json();
|
||||||
|
alert(`Error: ${data.detail || 'Action failed'}`);
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
alert(`Error: ${error.message}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Make reviewAction available globally
|
||||||
|
window.reviewAction = reviewAction;
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
Loading…
Reference in New Issue