# P2P Wiki AI AI-augmented system for the P2P Foundation Wiki with two main features: 1. **Conversational Agent** - Ask questions about the 23,000+ wiki articles using RAG (Retrieval Augmented Generation) 2. **Article Ingress Pipeline** - Drop article URLs to automatically analyze content, find matching wiki articles for citations, and generate draft articles ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ P2P Wiki AI System │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Chat (Q&A) │ │ Ingress Tool │ │ │ │ via RAG │ │ (URL Drop) │ │ │ └────────┬────────┘ └────────┬────────┘ │ │ │ │ │ │ └───────────┬───────────┘ │ │ ▼ │ │ ┌───────────────────────┐ │ │ │ FastAPI Backend │ │ │ └───────────┬───────────┘ │ │ │ │ │ ┌──────────────┼──────────────┐ │ │ ▼ ▼ ▼ │ │ ┌──────────┐ ┌─────────────┐ ┌──────────────┐ │ │ │ ChromaDB │ │ Ollama/ │ │ Article │ │ │ │ (Vector) │ │ Claude │ │ Scraper │ │ │ └──────────┘ └─────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Quick Start ### 1. Prerequisites - Python 3.10+ - [Ollama](https://ollama.ai) installed locally (or access to a remote Ollama server) - Optional: Anthropic API key for Claude (higher quality article drafts) ### 2. Install Dependencies ```bash cd /home/jeffe/Github/p2pwiki-content pip install -e . ``` ### 3. Parse Wiki Content Convert the MediaWiki XML dumps to searchable JSON: ```bash python -m src.parser ``` This creates `data/articles.json` with all parsed articles (~23,000 pages). ### 4. Generate Embeddings Create the vector store for semantic search: ```bash python -m src.embeddings ``` This creates the ChromaDB vector store in `data/chroma/`. Takes a few minutes. ### 5. Configure Environment ```bash cp .env.example .env # Edit .env with your settings ``` ### 6. Run the Server ```bash python -m src.api ``` Visit http://localhost:8420/ui for the web interface. ## Docker Deployment For production deployment on the RS 8000: ```bash # Build and run docker compose up -d --build # Check logs docker compose logs -f # Access at http://localhost:8420/ui # Or via Traefik at https://wiki-ai.jeffemmett.com ``` ## API Endpoints ### Chat ```bash # Ask a question curl -X POST http://localhost:8420/chat \ -H "Content-Type: application/json" \ -d '{"query": "What is commons-based peer production?"}' ``` ### Ingress ```bash # Process an external article curl -X POST http://localhost:8420/ingress \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com/article-about-cooperatives"}' ``` ### Review Queue ```bash # Get all items in review queue curl http://localhost:8420/review # Approve a draft article curl -X POST http://localhost:8420/review/action \ -H "Content-Type: application/json" \ -d '{"filepath": "/path/to/item.json", "item_type": "draft", "item_index": 0, "action": "approve"}' ``` ### Search ```bash # Direct vector search curl "http://localhost:8420/search?q=cooperative%20economics&n=10" # List article titles curl "http://localhost:8420/articles?limit=100" ``` ## Hybrid AI Routing The system uses intelligent routing between local (Ollama) and cloud (Claude) LLMs: | Task | Default LLM | Reasoning | |------|-------------|-----------| | Chat Q&A | Ollama | Fast, free, good enough for retrieval-based answers | | Content Analysis | Claude | Better at extracting topics and identifying wiki relevance | | Draft Generation | Claude | Higher quality article writing | | Embeddings | Local (sentence-transformers) | Fast, free, optimized for semantic search | Configure in `.env`: ``` USE_CLAUDE_FOR_DRAFTS=true USE_OLLAMA_FOR_CHAT=true ``` ## Project Structure ``` p2pwiki-content/ ├── src/ │ ├── api.py # FastAPI backend │ ├── config.py # Configuration settings │ ├── embeddings.py # Vector store (ChromaDB) │ ├── ingress.py # Article scraper & analyzer │ ├── llm.py # LLM client (Ollama/Claude) │ ├── parser.py # MediaWiki XML parser │ └── rag.py # RAG chat system ├── web/ │ └── index.html # Web UI ├── data/ │ ├── articles.json # Parsed wiki content │ ├── chroma/ # Vector store │ └── review_queue/ # Pending ingress items ├── xmldump/ # MediaWiki XML dumps ├── docker-compose.yml ├── Dockerfile └── pyproject.toml ``` ## Content Coverage The P2P Foundation Wiki contains ~23,000 articles covering: - Peer-to-peer networks and culture - Commons-based peer production (CBPP) - Alternative economics and post-capitalism - Cooperative business models - Open source and free culture - Collaborative governance - Sustainability and ecology ## License The wiki content is from the P2P Foundation under their respective licenses. The AI system code is provided as-is for educational purposes.