erowid-bot

Commit Graph

Author	SHA1	Message	Date
Jeff Emmett	80b398643e	Drastically reduce prompt size for CPU inference speed - Cut context to 512 tokens, max output to 128 - Only 2 retrieval chunks of 150 chars each (no headers) - Keep only last 2 conversation messages - Minimized system prompt overhead Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 01:47:06 -07:00
Jeff Emmett	08be7716f9	Aggressively optimize Ollama CPU inference speed - Warm up both models on startup with keep_alive=24h (no cold starts) - Use 16 threads for inference (server has 20 cores) - Reduce context window to 1024 tokens, max output to 256 - Persistent httpx client for embedding calls (skip TCP handshake) - Trim RAG chunks to 300 chars, history to 4 messages - Shorter system prompt and context wrapper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 01:12:04 -07:00
Jeff Emmett	3215283f97	Speed up bot: use llama3.2:1b, reduce context, limit tokens - Switch default model from llama3.1:8b to llama3.2:1b (2x faster on CPU) - Limit Ollama context to 2048 tokens and max output to 512 tokens - Reduce retrieval chunks from 4 to 3, chunk content from 800 to 500 chars - Trim conversation history from 10 to 6 messages - Shorten system prompt to reduce input tokens Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 19:44:04 -07:00
Jeff Emmett	d09d065d08	Initial commit: Erowid conversational bot RAG-powered chatbot that indexes Erowid's experience reports and substance info, making them searchable via natural conversation. Built with FastAPI, PostgreSQL+pgvector, Ollama embeddings, and streaming LLM responses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 01:19:49 +00:00

4 Commits