erowid-bot

History

Jeff Emmett 08be7716f9 Aggressively optimize Ollama CPU inference speed - Warm up both models on startup with keep_alive=24h (no cold starts) - Use 16 threads for inference (server has 20 cores) - Reduce context window to 1024 tokens, max output to 256 - Persistent httpx client for embedding calls (skip TCP handshake) - Trim RAG chunks to 300 chars, history to 4 messages - Shorter system prompt and context wrapper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-02-17 01:12:04 -07:00
..
scraper	Initial commit: Erowid conversational bot	2026-02-17 01:19:49 +00:00
static	Initial commit: Erowid conversational bot	2026-02-17 01:19:49 +00:00
__init__.py	Initial commit: Erowid conversational bot	2026-02-17 01:19:49 +00:00
config.py	Speed up bot: use llama3.2:1b, reduce context, limit tokens	2026-02-16 19:44:04 -07:00
database.py	Initial commit: Erowid conversational bot	2026-02-17 01:19:49 +00:00
embeddings.py	Aggressively optimize Ollama CPU inference speed	2026-02-17 01:12:04 -07:00
llm.py	Aggressively optimize Ollama CPU inference speed	2026-02-17 01:12:04 -07:00
main.py	Aggressively optimize Ollama CPU inference speed	2026-02-17 01:12:04 -07:00
models.py	Initial commit: Erowid conversational bot	2026-02-17 01:19:49 +00:00
rag.py	Aggressively optimize Ollama CPU inference speed	2026-02-17 01:12:04 -07:00