200 lines
6.8 KiB
Markdown
200 lines
6.8 KiB
Markdown
# P2P Wiki AI
|
|
|
|
AI-augmented system for the P2P Foundation Wiki with two main features:
|
|
|
|
1. **Conversational Agent** - Ask questions about the 23,000+ wiki articles using RAG (Retrieval Augmented Generation)
|
|
2. **Article Ingress Pipeline** - Drop article URLs to automatically analyze content, find matching wiki articles for citations, and generate draft articles
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ P2P Wiki AI System │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │ Chat (Q&A) │ │ Ingress Tool │ │
|
|
│ │ via RAG │ │ (URL Drop) │ │
|
|
│ └────────┬────────┘ └────────┬────────┘ │
|
|
│ │ │ │
|
|
│ └───────────┬───────────┘ │
|
|
│ ▼ │
|
|
│ ┌───────────────────────┐ │
|
|
│ │ FastAPI Backend │ │
|
|
│ └───────────┬───────────┘ │
|
|
│ │ │
|
|
│ ┌──────────────┼──────────────┐ │
|
|
│ ▼ ▼ ▼ │
|
|
│ ┌──────────┐ ┌─────────────┐ ┌──────────────┐ │
|
|
│ │ ChromaDB │ │ Ollama/ │ │ Article │ │
|
|
│ │ (Vector) │ │ Claude │ │ Scraper │ │
|
|
│ └──────────┘ └─────────────┘ └──────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### 1. Prerequisites
|
|
|
|
- Python 3.10+
|
|
- [Ollama](https://ollama.ai) installed locally (or access to a remote Ollama server)
|
|
- Optional: Anthropic API key for Claude (higher quality article drafts)
|
|
|
|
### 2. Install Dependencies
|
|
|
|
```bash
|
|
cd /home/jeffe/Github/p2pwiki-content
|
|
pip install -e .
|
|
```
|
|
|
|
### 3. Parse Wiki Content
|
|
|
|
Convert the MediaWiki XML dumps to searchable JSON:
|
|
|
|
```bash
|
|
python -m src.parser
|
|
```
|
|
|
|
This creates `data/articles.json` with all parsed articles (~23,000 pages).
|
|
|
|
### 4. Generate Embeddings
|
|
|
|
Create the vector store for semantic search:
|
|
|
|
```bash
|
|
python -m src.embeddings
|
|
```
|
|
|
|
This creates the ChromaDB vector store in `data/chroma/`. Takes a few minutes.
|
|
|
|
### 5. Configure Environment
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env with your settings
|
|
```
|
|
|
|
### 6. Run the Server
|
|
|
|
```bash
|
|
python -m src.api
|
|
```
|
|
|
|
Visit http://localhost:8420/ui for the web interface.
|
|
|
|
## Docker Deployment
|
|
|
|
For production deployment on the RS 8000:
|
|
|
|
```bash
|
|
# Build and run
|
|
docker compose up -d --build
|
|
|
|
# Check logs
|
|
docker compose logs -f
|
|
|
|
# Access at http://localhost:8420/ui
|
|
# Or via Traefik at https://wiki-ai.jeffemmett.com
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### Chat
|
|
|
|
```bash
|
|
# Ask a question
|
|
curl -X POST http://localhost:8420/chat \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"query": "What is commons-based peer production?"}'
|
|
```
|
|
|
|
### Ingress
|
|
|
|
```bash
|
|
# Process an external article
|
|
curl -X POST http://localhost:8420/ingress \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"url": "https://example.com/article-about-cooperatives"}'
|
|
```
|
|
|
|
### Review Queue
|
|
|
|
```bash
|
|
# Get all items in review queue
|
|
curl http://localhost:8420/review
|
|
|
|
# Approve a draft article
|
|
curl -X POST http://localhost:8420/review/action \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"filepath": "/path/to/item.json", "item_type": "draft", "item_index": 0, "action": "approve"}'
|
|
```
|
|
|
|
### Search
|
|
|
|
```bash
|
|
# Direct vector search
|
|
curl "http://localhost:8420/search?q=cooperative%20economics&n=10"
|
|
|
|
# List article titles
|
|
curl "http://localhost:8420/articles?limit=100"
|
|
```
|
|
|
|
## Hybrid AI Routing
|
|
|
|
The system uses intelligent routing between local (Ollama) and cloud (Claude) LLMs:
|
|
|
|
| Task | Default LLM | Reasoning |
|
|
|------|-------------|-----------|
|
|
| Chat Q&A | Ollama | Fast, free, good enough for retrieval-based answers |
|
|
| Content Analysis | Claude | Better at extracting topics and identifying wiki relevance |
|
|
| Draft Generation | Claude | Higher quality article writing |
|
|
| Embeddings | Local (sentence-transformers) | Fast, free, optimized for semantic search |
|
|
|
|
Configure in `.env`:
|
|
```
|
|
USE_CLAUDE_FOR_DRAFTS=true
|
|
USE_OLLAMA_FOR_CHAT=true
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
p2pwiki-content/
|
|
├── src/
|
|
│ ├── api.py # FastAPI backend
|
|
│ ├── config.py # Configuration settings
|
|
│ ├── embeddings.py # Vector store (ChromaDB)
|
|
│ ├── ingress.py # Article scraper & analyzer
|
|
│ ├── llm.py # LLM client (Ollama/Claude)
|
|
│ ├── parser.py # MediaWiki XML parser
|
|
│ └── rag.py # RAG chat system
|
|
├── web/
|
|
│ └── index.html # Web UI
|
|
├── data/
|
|
│ ├── articles.json # Parsed wiki content
|
|
│ ├── chroma/ # Vector store
|
|
│ └── review_queue/ # Pending ingress items
|
|
├── xmldump/ # MediaWiki XML dumps
|
|
├── docker-compose.yml
|
|
├── Dockerfile
|
|
└── pyproject.toml
|
|
```
|
|
|
|
## Content Coverage
|
|
|
|
The P2P Foundation Wiki contains ~23,000 articles covering:
|
|
|
|
- Peer-to-peer networks and culture
|
|
- Commons-based peer production (CBPP)
|
|
- Alternative economics and post-capitalism
|
|
- Cooperative business models
|
|
- Open source and free culture
|
|
- Collaborative governance
|
|
- Sustainability and ecology
|
|
|
|
## License
|
|
|
|
The wiki content is from the P2P Foundation under their respective licenses.
|
|
The AI system code is provided as-is for educational purposes.
|