p2pwiki-ai/README.md

200 lines
6.8 KiB
Markdown

# P2P Wiki AI
AI-augmented system for the P2P Foundation Wiki with two main features:
1. **Conversational Agent** - Ask questions about the 23,000+ wiki articles using RAG (Retrieval Augmented Generation)
2. **Article Ingress Pipeline** - Drop article URLs to automatically analyze content, find matching wiki articles for citations, and generate draft articles
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ P2P Wiki AI System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Chat (Q&A) │ │ Ingress Tool │ │
│ │ via RAG │ │ (URL Drop) │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ └───────────┬───────────┘ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ FastAPI Backend │ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ ChromaDB │ │ Ollama/ │ │ Article │ │
│ │ (Vector) │ │ Claude │ │ Scraper │ │
│ └──────────┘ └─────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
## Quick Start
### 1. Prerequisites
- Python 3.10+
- [Ollama](https://ollama.ai) installed locally (or access to a remote Ollama server)
- Optional: Anthropic API key for Claude (higher quality article drafts)
### 2. Install Dependencies
```bash
cd /home/jeffe/Github/p2pwiki-content
pip install -e .
```
### 3. Parse Wiki Content
Convert the MediaWiki XML dumps to searchable JSON:
```bash
python -m src.parser
```
This creates `data/articles.json` with all parsed articles (~23,000 pages).
### 4. Generate Embeddings
Create the vector store for semantic search:
```bash
python -m src.embeddings
```
This creates the ChromaDB vector store in `data/chroma/`. Takes a few minutes.
### 5. Configure Environment
```bash
cp .env.example .env
# Edit .env with your settings
```
### 6. Run the Server
```bash
python -m src.api
```
Visit http://localhost:8420/ui for the web interface.
## Docker Deployment
For production deployment on the RS 8000:
```bash
# Build and run
docker compose up -d --build
# Check logs
docker compose logs -f
# Access at http://localhost:8420/ui
# Or via Traefik at https://wiki-ai.jeffemmett.com
```
## API Endpoints
### Chat
```bash
# Ask a question
curl -X POST http://localhost:8420/chat \
-H "Content-Type: application/json" \
-d '{"query": "What is commons-based peer production?"}'
```
### Ingress
```bash
# Process an external article
curl -X POST http://localhost:8420/ingress \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article-about-cooperatives"}'
```
### Review Queue
```bash
# Get all items in review queue
curl http://localhost:8420/review
# Approve a draft article
curl -X POST http://localhost:8420/review/action \
-H "Content-Type: application/json" \
-d '{"filepath": "/path/to/item.json", "item_type": "draft", "item_index": 0, "action": "approve"}'
```
### Search
```bash
# Direct vector search
curl "http://localhost:8420/search?q=cooperative%20economics&n=10"
# List article titles
curl "http://localhost:8420/articles?limit=100"
```
## Hybrid AI Routing
The system uses intelligent routing between local (Ollama) and cloud (Claude) LLMs:
| Task | Default LLM | Reasoning |
|------|-------------|-----------|
| Chat Q&A | Ollama | Fast, free, good enough for retrieval-based answers |
| Content Analysis | Claude | Better at extracting topics and identifying wiki relevance |
| Draft Generation | Claude | Higher quality article writing |
| Embeddings | Local (sentence-transformers) | Fast, free, optimized for semantic search |
Configure in `.env`:
```
USE_CLAUDE_FOR_DRAFTS=true
USE_OLLAMA_FOR_CHAT=true
```
## Project Structure
```
p2pwiki-content/
├── src/
│ ├── api.py # FastAPI backend
│ ├── config.py # Configuration settings
│ ├── embeddings.py # Vector store (ChromaDB)
│ ├── ingress.py # Article scraper & analyzer
│ ├── llm.py # LLM client (Ollama/Claude)
│ ├── parser.py # MediaWiki XML parser
│ └── rag.py # RAG chat system
├── web/
│ └── index.html # Web UI
├── data/
│ ├── articles.json # Parsed wiki content
│ ├── chroma/ # Vector store
│ └── review_queue/ # Pending ingress items
├── xmldump/ # MediaWiki XML dumps
├── docker-compose.yml
├── Dockerfile
└── pyproject.toml
```
## Content Coverage
The P2P Foundation Wiki contains ~23,000 articles covering:
- Peer-to-peer networks and culture
- Commons-based peer production (CBPP)
- Alternative economics and post-capitalism
- Cooperative business models
- Open source and free culture
- Collaborative governance
- Sustainability and ecology
## License
The wiki content is from the P2P Foundation under their respective licenses.
The AI system code is provided as-is for educational purposes.