- Cut context to 512 tokens, max output to 128
- Only 2 retrieval chunks of 150 chars each (no headers)
- Keep only last 2 conversation messages
- Minimized system prompt overhead
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch default model from llama3.1:8b to llama3.2:1b (2x faster on CPU)
- Limit Ollama context to 2048 tokens and max output to 512 tokens
- Reduce retrieval chunks from 4 to 3, chunk content from 800 to 500 chars
- Trim conversation history from 10 to 6 messages
- Shorten system prompt to reduce input tokens
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RAG-powered chatbot that indexes Erowid's experience reports and substance
info, making them searchable via natural conversation. Built with FastAPI,
PostgreSQL+pgvector, Ollama embeddings, and streaming LLM responses.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>